Data Cleaning In Python (Practical Examples)



Data Cleaning In Python with Pandas
In this tutorial we will see some practical issues we have when working with data,how to diagnose them and how to solve them.

==Tutorial and Data Set here==
Github:
Blog:

Get More Here – Building ML Web Apps

===Great Books For Mastering Data Science and Data Cleaning===
Python For Data Analysis :
Python Data Science HandBook:
Hands On Machine Learning with Scikit-Learn & TensorFlow:
Python Machine Learning by Sebastian Raschka:
Python Cookbook:

Reference
====Common Data Cleaning Issues====
Reading File
Inconsistent Column Names
Missing Data
Duplicates
Inconsistent Data Types
Outliers
Noisy Data
etc.

If you liked the video don’t forget to leave a like or subscribe.
If you need any help just message me in the comments, you never know it might help someone else too.
J-Secur1ty JCharisTech

Follow

Nguồn: https://benjaminjcohen.com/

Xem thêm bài viết khác: https://benjaminjcohen.com/cong-nghe/

23 Comments

  1. You are truly a blessing. God bless you richly.

  2. Hi, I need your help with cleaning my dataset. How can I contact you?

  3. what to do when there is just one column which name is an observation and in the columns (including the name) there is more information?

    example:
    index William Smith (1655 – 2008)
    0 James Isac (1992)

    1 Sofie (121-122)

  4. PLEASE HELP!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    CODE::
    import pandas as pd

    file1 = r'D:C_pHBook1 Lat_LongVspH.csv';
    df = pd.read_csv(file1)

    SHOWS ERROR———————-

    ParserError Traceback (most recent call last)

    <ipython-input-2-21d97f424a6d> in <module>

    —-> 1 df = pd.read_csv(file1)

    C:UsersKIITAppDataRoamingPythonPython37site-packagespandasioparsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)

    683 )

    684

    –> 685 return _read(filepath_or_buffer, kwds)

    686

    687 parser_f._name_ = name

    C:UsersKIITAppDataRoamingPythonPython37site-packagespandasioparsers.py in _read(filepath_or_buffer, kwds)

    461

    462 try:

    –> 463 data = parser.read(nrows)

    464 finally:

    465 parser.close()

    C:UsersKIITAppDataRoamingPythonPython37site-packagespandasioparsers.py in read(self, nrows)

    1152 def read(self, nrows=None):

    1153 nrows = _validate_integer("nrows", nrows)

    -> 1154 ret = self._engine.read(nrows)

    1155

    1156 # May alter columns / col_dict

    C:UsersKIITAppDataRoamingPythonPython37site-packagespandasioparsers.py in read(self, nrows)

    2057 def read(self, nrows=None):

    2058 try:

    -> 2059 data = self._reader.read(nrows)

    2060 except StopIteration:

    2061 if self._first_chunk:

    pandas_libsparsers.pyx in pandas._libs.parsers.TextReader.read()

    pandas_libsparsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

    pandas_libsparsers.pyx in pandas._libs.parsers.TextReader._read_rows()

    pandas_libsparsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

    pandas_libsparsers.pyx in pandas._libs.parsers.raise_parser_error()

    ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 3

  5. https://github.com/Dangieboy/CSV-Data-Cleaner

  6. Hey guys, check out my selfmade csv data cleaner!
    https://github.com/Dangieboy/CSV-Data-Cleaner/blob/master/README.md

  7. I didn’t know it’s called data cleaning lol

  8. Hello, you have a Ghanaian accent if am not mistaken. Anyway nice tutorials, am new to data science and this video was very helpful. Keep them coming

  9. With the missing data which solution is better and reliable?

  10. This has been one of your best videos. Would you be able to do a tutorial on Feature Engineering incl. Feature Extraction? Thank you in anticipation

  11. I am unable to download the data file, Please help.

  12. Wow, you're video helped me in a desperate time. If I get this job because of this video, I owe you big time! Thank you so much

  13. Excellent helped a bunch thanks!

  14. How to clean the dataset in the jupyter platform. Can you please explain me?

  15. your tutorial is the best tutorial of data cleaning

  16. If a particular element in a row has String values while the entire column of that element is integer,
    how do you cleanse that data?

  17. Many thanks find extremely useful, i'm new to python and programming, so was useful for me as a beginner too

  18. hello , very interesting tutorial.
    I am litle newbie and i am stuck and don't know how to clean my data or to move forward with the error coming from my data. a small help would be beneficial sir:
    here it is:

    import os
    import pandas as pd
    import scipy as sp
    import matplotlib as mpl
    import seaborn sns

    df = pd.read_csv("mars-2014-complete.csv")
    —————————————————————————
    ParserError Traceback (most recent call last)
    <ipython-input-10-66683b903f07> in <module>()
    —-> 1 df = pd.read_csv("mars-2014-complete.csv")

    ~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676 skip_blank_lines=skip_blank_lines)
    677
    –> 678 return _read(filepath_or_buffer, kwds)
    679
    680 parser_f._name_ = name

    ~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    444
    445 try:
    –> 446 data = parser.read(nrows)
    447 finally:
    448 parser.close()

    ~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
    1034 raise ValueError('skipfooter not supported for iteration')
    1035
    -> 1036 ret = self._engine.read(nrows)
    1037
    1038 # May alter columns / col_dict

    ~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
    1846 def read(self, nrows=None):
    1847 try:
    -> 1848 data = self._reader.read(nrows)
    1849 except StopIteration:
    1850 if self._first_chunk:

    pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

    pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

    pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

    pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

    pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

    ParserError: Error tokenizing data. C error: Expected 8 fields in line 103, saw 9

  19. Please make some more videos on Data Cleaning In Python with Pandas. Thank you….

Leave a Reply

Your email address will not be published. Required fields are marked *