#aarya tadvalkar#csv#f string#kgp talkie#python#text files in python#tsv

Working with Text Files in Python for NLP

Learn to read, write, and process text, CSV, TSV, and PDF files in Python. Covers f-strings, file I/O operations, and Jupyter %%writefile for NLP workflows.

May 16, 2026 at 8:15 AM10 min readFollowFollow (Hindi)

Topics You Will Master

f-string formatting for structured and dynamic text output
Reading and writing .CSV and .TSV files with Python's csv module
Using Jupyter %%writefile magic to create text files in notebooks
Python built-in file I/O: open, read, write, and close patterns
Handling character encoding and file path management for NLP
Best For

Python beginners preparing and ingesting text data for NLP experiments.

Expected Outcome

Practical command of Python file I/O operations for text preprocessing workflows.

Working with the text files

  • Working with f-strings for formated print
  • Working with .CSV, .TSV files to read and write
  • Working with %%writefile to create simple .txt files [works in jupyter notebook only]
  • Working with Python's inbuilt file read and write

Watch full video here:

String Formatter

String formatting enables us to display the strings in a specified format. This helps us to improve the visual effect and also to process the strings later.

PYTHON
name = 'KGP Talkie'

The format() method formats the specified value(s) and insert them inside the string's placeholder. The placeholder is defined using curly brackets: {}.

PYTHON
print('The YouTube channel is {}'.format(name))
OUTPUT
The YouTube channel is KGP Talkie

To create an f-string, prefix the string with the letter “ f ”. The string itself can be formatted in much the same way that you would with str.format(). F-strings provide a concise and convenient way to embed python expressions inside string literals for formatting.

PYTHON
print(f'The YouTube channel is {name}')
OUTPUT
The YouTube channel is KGP Talkie

Now we are going to see how to work with minimum width and alignment between the columns. Here we have created a list of tuples.

PYTHON
data_science_tuts = [('Python for Beginners', 19),
                    ('Feature Selectiong for Machine Learning', 11),
                    ('Machine Learning Tutorials', 11),
                    ('Deep Learning Tutorials', 19)]
data_science_tuts
OUTPUT
[('Python for Beginners', 19), ('Feature Selectiong for Machine Learning', 11), ('Machine Learning Tutorials', 11), ('Deep Learning Tutorials', 19)]

First we will print the contents of the list without any formating or alignment.

PYTHON
for info in data_science_tuts:
    print(info)
OUTPUT
('Python for Beginners', 19)
('Feature Selectiong for Machine Learning', 11)
('Machine Learning Tutorials', 11)
('Deep Learning Tutorials', 19)

Now we will print the same thing using proper alignment. Here info[0] represents the first value of the tuple and info[1] represents the second value. {50} and {20} indicate the space between the columns.

PYTHON
for info in data_science_tuts:
    print(f'{info[0]:{50}} {info[1]:{10}}')
OUTPUT
Python for Beginners                                       19
Feature Selectiong for Machine Learning                    11
Machine Learning Tutorials                                 11
Deep Learning Tutorials                                    19
  • :< Forces the field to be left-aligned within the available space (this is the default for most objects).
  • :> Forces the field to be right-aligned within the available space (this is the default for numbers).
  • :^ Forces the field to be centered within the available space.

. adds the dots which you can see below.

PYTHON
for info in data_science_tuts:
    print(f'{info[0]:{50}} {info[1]:.>{10}}')
OUTPUT
Python for Beginners                               ........19
Feature Selectiong for Machine Learning            ........11
Machine Learning Tutorials                         ........11
Deep Learning Tutorials                            ........19

Working with .CSV or .TSV Files

Now we will see how to work with CSV(Comma Separated Values) and TSV(Tab Separated Values) files.

The first step is to read such files. We will use pandas to read the files.

PYTHON
import pandas as pd

read_csv() is an important pandas function to read CSV files. We can use it to read TSV files as well by setting sep = '\t' which means the separator is a tabhead() returns the first 5 rows of the dataframe.

PYTHON
data = pd.read_csv('moviereviews.tsv', sep = '\t')
data.head()
OUTPUT
labelreview
0neghow do films like mouse hunt get into theatres...
1negsome talented actresses are blessed with a dem...
2posthis has been an extraordinary year for austra...
3posaccording to hollywood movies made in last few...
4negmy first press screening of 1998 and already i...

The shape attribute of pandas dataFrame stores the number of rows and columns as a tuple (number of rows, number of columns). In the data which was read using read_csv() there are 2000 rows and 2 columns.

PYTHON
data.shape
OUTPUT
(2000, 2)

value_counts() function return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. We have called value_counts() on data['label']which is the column named label. It has 1000 occurences of neg and 1000 occurences of pos.

PYTHON
data['label'].value_counts()
OUTPUT
neg    1000
pos    1000
Name: label, dtype: int64

Now we are specifying a condition data['label']=='pos'. That means we will only get those rows which have pos in their label column.

PYTHON
pos = data[data['label']=='pos']
pos.head()
OUTPUT
labelreview
2posthis has been an extraordinary year for austra...
3posaccording to hollywood movies made in last few...
11poswith stars like sigourney weaver ( " alien " t...
16posi remember hearing about this film when it fir...
18posgarry shandling makes his long overdue starrin...

to_csv() method is used to save a Pandas DataFrame as a CSV file. We have stored the dataframe pos as a TSV file because we have set sep = '\t'. We have set index = False because we do not want the index to be stored in csv file.

PYTHON
pos.to_csv('pos.tsv', sep = '\t', index = False)
pd.read_csv('pos.tsv', sep = '\t').head()
OUTPUT
labelreview
0posthis has been an extraordinary year for austra...
1posaccording to hollywood movies made in last few...
2poswith stars like sigourney weaver ( " alien " t...
3posi remember hearing about this film when it fir...
4posgarry shandling makes his long overdue starrin...

Built in magic command in jupyter %%writefile

%%writefile writes the contents of the cell to a file. Here the content will be written into text1.txt.

PYTHON
%%writefile text1.txt
Hello, this is the NLP lesson.
Please Like and Subscribe to show your support
OUTPUT
Writing text1.txt

Screenshot of text1.txt file contents showing the NLP lesson text written with %%writefile magic command

-a flag is used to append contents of the cell to an existing file. The file will be created if it does not exist.

PYTHON
%%writefile -a text1.txt
Thanks for watching
OUTPUT
Appending to text1.txt

Screenshot of text1.txt after appending a third line using %%writefile -a flag

Use python's inbuilt command to read and write text file

The open() function opens a file, and returns it as a file object. There are various modes in which you can open the file. Some of the basic modes are:-

  • "r" - Read - Default value. Opens a file for reading, error if the file does not exist
  • "a" - Append - Opens a file for appending, creates the file if it does not exist
  • "w" - Write - Opens a file for writing, creates the file if it does not exist
  • "x" - Create - Creates the specified file, returns an error if the file exist

We have opened the file in the read mode.

PYTHON
file = open('text1.txt', 'r')
file

The read() method returns the specified number of bytes from the file. Default is -1 which means the whole file.

PYTHON
file.read()
OUTPUT
'Hello, this is the NLP lesson.\nPlease Like and Subscribe to show your support\nThanks for watching\n'

If we read the same file again we will get an empty string. This is because the file pointer has reached the end of the file.

PYTHON
file.read()

seek() sets the file's current position at the offset specified. We have specified the offset as 0. Hence the file pointer will be set at the start of the file.

PYTHON
file.seek(0)
OUTPUT
0

Now if we read the file we will not get an empty string.

PYTHON
file.read()
OUTPUT
'Hello, this is the NLP lesson.\nPlease Like and Subscribe to show your support\nThanks for watching\n'
PYTHON
file.seek(0)
OUTPUT
0

readline() reads one entire line from the file. If we call it the second time it will read the second line.

PYTHON
file.readline()
OUTPUT
'Hello, this is the NLP lesson.\n'
PYTHON
file.seek(0)
OUTPUT
0

readlines() reads until EOF(End Of File) using readline() and returns a list containing the lines.

PYTHON
file.readlines()
OUTPUT
['Hello, this is the NLP lesson.\n', 'Please Like and Subscribe to show your support\n', 'Thanks for watching\n']

It is a good practice to use the close() method to close a file after performing all the operations. After you close a file you cannot perform any operations on it but the file object is still available.

PYTHON
file.close()
file

If we do not want to explicitly close the file we can read the file in the following way.

PYTHON
with open('text1.txt') as file:
    text_data = file.readlines()
    print(text_data)
OUTPUT
['Hello, this is the NLP lesson.\n', 'Please Like and Subscribe to show your support\n', 'Thanks for watching\n']

strip() returns a copy of the string with both leading and trailing characters removed.

PYTHON
for temp in text_data:
    print(temp.strip())
OUTPUT
Hello, this is the NLP lesson.
Please Like and Subscribe to show your support
Thanks for watching

enumerate() method adds a counter to an iterable and returns it in a form of enumerate object. This enumerate object can then be used directly in for loops or be converted into a list of tuples using list() method.

PYTHON
for i, temp in enumerate(text_data):
    print(str(i) + "  --->  " + temp.strip())
OUTPUT
0  --->  Hello, this is the NLP lesson.
1  --->  Please Like and Subscribe to show your support
2  --->  Thanks for watching

Now we will see how to write a file. For that we will open a file in the write(w) mode.

PYTHON
file = open('text2.txt', 'w')
file

The write() method writes a specified text to the file. It returns the number of characters written.

PYTHON
file.write('This is just another lesson')
OUTPUT
27

If you see text2.txt right now it will be an empty file. This is because we need to close the file to complete the write operation.

PYTHON
file.close()

Screenshot of text2.txt file created with Python open() in write mode containing 27 characters

An alternative to write a file is given below. In this case closing of the file is not required.

PYTHON
with open('text3.txt', 'w') as file:
    file.write('This is third file \n')

Screenshot of text3.txt file written using Python context manager with open() showing single line

PYTHON
text_data
OUTPUT
['Hello, this is the NLP lesson.\n', 'Please Like and Subscribe to show your support\n', 'Thanks for watching\n']

Now we will open text3.txt in append mode and append the content of text_data to it.

PYTHON
with open('text3.txt', 'a') as file:
    for temp in text_data:
        file.write(temp)

Screenshot of text3.txt after appending text_data lines using open() in append mode

Conclusion

In this tutorial you learned Python's core file I/O patterns for NLP data preparation — from f-string formatting for readable output, to reading and writing CSV/TSV datasets with pandas, to direct file operations using Python's built-in open() for plain text.

Key takeaways:

  • f-strings (f'{name}') are the modern Python standard for string interpolation, replacing str.format() — use the :{width} and :.>{fill} format specifiers for column alignment in reports.
  • pandas.read_csv(path, sep=' ') handles both CSV and TSV files by adjusting the separator; to_csv(..., index=False) exports without the index column.
  • open(path, 'r') returns a file handle — always call file.seek(0) before re-reading, or use a with block which automatically closes the handle and avoids the stale-pointer issue.
  • %%writefile is a Jupyter-only magic command; for production pipelines, use the with open(path, 'w') as f: f.write(...) pattern which works everywhere.

Next steps:

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments