How to extract numbers from strings in Python?

I'll extract all the numbers contained in the string. Which is more appropriate for the purpose, regular expression or i...

I'll extract all the numbers contained in the string. Which is more appropriate for the purpose, regular expression or isdigit() method?

Example:

line = "hello 12 hi 89"

Result:

[12, 89]

#1 building

@jmnas, I love your answer, but I can't find floating-point numbers. I am working on A script to analyze the code to be fed into CNC milling machine, and need to find X and Y dimensions that can be integer or floating-point number, so I change the code to the following. Find int, float with positive and negative values. You still can't find A value in hexadecimal format, but you can add "X" and "A" to "F" in the num char tuple, which I think will parse something like "0x23AC".

s = 'hello X42 I\'m a Y-32.35 string Z30' xy = ("X", "Y") num_char = (".", "+", "-") l = [] tokens = s.split() for token in tokens: if token.startswith(xy): num = "" for char in token: # print(char) if char.isdigit() or (char in num_char): num = num + char try: l.append(float(num)) except ValueError: pass print(l)

#2 building

This is a bit late, but you can also extend the regex expression to illustrate the scientific notation.

import re # Format is [(<string>, <expected output>), ...] ss = [("apple-12.34 ba33na fanc-14.23e-2yapple+45e5+67.56E+3", ['-12.34', '33', '-14.23e-2', '+45e5', '+67.56E+3']), ('hello X42 I\'m a Y-32.35 string Z30', ['42', '-32.35', '30']), ('he33llo 42 I\'m a 32 string -30', ['33', '42', '32', '-30']), ('h3110 23 cat 444.4 rabbit 11 2 dog', ['3110', '23', '444.4', '11', '2']), ('hello 12 hi 89', ['12', '89']), ('4', ['4']), ('I like 74,600 commas not,500', ['74,600', '500']), ('I like bad math 1+2=.001', ['1', '+2', '.001'])] for s, r in ss: rr = re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", s) if rr == r: print('GOOD') else: print('WRONG', rr, 'should be', r)

everything goes well!

In addition, you can view AWS Glue built in regular expression

#3 building

The best choices I have found are as follows. It will extract a number and eliminate any type of character.

def extract_nbr(input_str): if input_str is None or input_str == '': return 0 out_number = '' for ele in input_str: if ele.isdigit(): out_number += ele return float(out_number)

#4 building

This answer also contains the case where the number is floating-point in the string

def get_first_nbr_from_str(input_str): ''' :param input_str: strings that contains digit and words :return: the number extracted from the input_str demo: 'ab324.23.123xyz': 324.23 '.5abc44': 0.5 ''' if not input_str and not isinstance(input_str, str): return 0 out_number = '' for ele in input_str: if (ele == '.' and '.' not in out_number) or ele.isdigit(): out_number += ele elif out_number: break return float(out_number)

#5 building

If you know that there is only one number in the string, "hello 12 hi," you can try filtering.

For example:

In [1]: int(''.join(filter(str.isdigit, '200 grams'))) Out[1]: 200 In [2]: int(''.join(filter(str.isdigit, 'Counters: 55'))) Out[2]: 55 In [3]: int(''.join(filter(str.isdigit, 'more than 23 times'))) Out[3]: 23

But be careful! :

In [4]: int(''.join(filter(str.isdigit, '200 grams 5'))) Out[4]: 2005

12 February 2020, 09:58 | Views: 7327

Add new comment

For adding a comment, please log in
or create account

0 comments