Interactive NumPy & Pandas Cheat Sheet

Interactive NumPy & Pandas Guide

Welcome to your interactive guide for NumPy and Pandas! This tool transforms the standard cheat sheet into a dynamic and easy-to-navigate resource. Use the menu on the left to jump directly to the topic you need, from creating NumPy arrays to manipulating Pandas DataFrames. Each code block includes a handy "Copy" button to streamline your workflow.

Select a topic from the navigation panel to get started.

NumPy (Numerical Python)

NumPy is the fundamental package for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. This section covers the essentials of creating, inspecting, and manipulating NumPy arrays.

Array Creation

From a list

arr = np.array([1, 2, 3, 4, 5])
# Creates a 1-dimensional NumPy array from a Python list.
# Output: [1 2 3 4 5]

Zeros / Ones / Empty

zeros_arr = np.zeros((3, 4))
# Output:
# [[0. 0. 0. 0.]
#  [0. 0. 0. 0.]
#  [0. 0. 0. 0.]]

ones_arr = np.ones((2, 2))
# Output:
# [[1. 1.]
#  [1. 1.]]

empty_arr = np.empty((2, 3))
# Creates an array with uninitialized data.

Range / Linspace

range_arr = np.arange(0, 10, 2)
# Creates an array with values from 0 up to (but not including) 10, with a step of 2.
# Output: [0 2 4 6 8]

lin_arr = np.linspace(0, 1, 5)
# Creates an array with 5 evenly spaced numbers from 0 to 1 (inclusive).
# Output: [0.   0.25 0.5  0.75 1.  ]

Random

rand_int = np.random.randint(0, 10, size=(2, 3))
# Creates a 2x3 array of random integers between 0 and 9.
# Output: (will vary)

rand_float = np.random.rand(2, 2)
# Creates a 2x2 array of random floats between 0.0 and 1.0.
# Output: (will vary)

Array Attributes

arr.shape: Returns a tuple with the dimensions of the array.

arr.ndim: Returns the number of dimensions.

arr.size: Returns the total number of elements.

arr.dtype: Returns the data type of the elements.

Indexing & Slicing

1D Array

arr = np.array([10, 20, 30, 40, 50])
arr[0]      # Output: 10
arr[1:4]    # Output: [20 30 40]
arr[-1]     # Output: 50

2D Array

arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d[0, 1]    # Output: 2 (row 0, col 1)
arr_2d[:, 1]    # Output: [2 5 8] (all rows, col 1)
arr_2d[1:, :2]  # Output: [[4 5], [7 8]]

Boolean Indexing

arr = np.array([1, 2, 3, 4, 5])
arr[arr > 3] # Selects elements where the condition is True.
# Output: [4 5]

Basic Operations

Arithmetic

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
a + b       # Output: [5 7 9]
a * 2       # Output: [2 4 6]
a * b       # Output: [ 4 10 18]

Aggregation

arr = np.array([[1, 2], [3, 4]])
arr.sum()       # Output: 10
arr.sum(axis=0) # Output: [4 6] (sum along columns)
arr.mean()      # Output: 2.5
arr.max(axis=1) # Output: [2 4] (max along rows)

Reshaping & Concatenation

arr = np.arange(1, 7)
arr.reshape(2, 3)     # Reshapes to a 2x3 array.
# Output:
# [[1 2 3]
#  [4 5 6]]

a = np.array([1, 2])
b = np.array([3, 4])
np.concatenate((a, b)) # Joins arrays.
# Output: [1 2 3 4]

Pandas

Pandas is a powerful library for data manipulation and analysis. It introduces two primary data structures: the Series (1D) and the DataFrame (2D). Pandas is indispensable for tasks like cleaning messy data, merging datasets, and performing complex selections and aggregations. This section covers the core functionalities you'll use daily.

Data Structures

Series

s = pd.Series([1, 3, 5, np.nan, 6, 8])
# A one-dimensional labeled array.
# Output:
# 0    1.0
# 1    3.0
# 2    5.0
# 3    NaN
# 4    6.0
# 5    8.0
# dtype: float64

DataFrame

data = {'col1': [1, 2, 3], 'col2': ['A', 'B', 'C']}
df = pd.DataFrame(data)
# A two-dimensional labeled data structure.
# Output:
#    col1 col2
# 0     1    A
# 1     2    B
# 2     3    C

Viewing Data

df.head(): View the first 5 rows.

df.tail(): View the last 5 rows.

df.info(): Get a concise summary of the DataFrame.

df.describe(): Generate descriptive statistics for numerical columns.

df.shape: Get the (rows, columns) tuple.

df.columns: Get the column labels.

Selection

Column Selection

df['col1']          # Selects a single column (returns a Series).
df[['col1', 'col2']]  # Selects multiple columns (returns a DataFrame).

Row Selection (by label - `loc`)

df.loc[0]              # Selects row with label 0.
df.loc[0:2, 'col1']    # Selects rows 0-2 (inclusive) for 'col1'.

Row Selection (by position - `iloc`)

df.iloc[0]             # Selects row at position 0.
df.iloc[0:2, 0]        # Selects rows 0-1 (exclusive) for column at position 0.

Conditional Selection

df[df['col1'] > 2]
df[(df['col1'] > 1) & (df['col2'] == 'B')]

Handling Missing Data

df.isnull(): Returns a boolean DataFrame indicating missing values.

df.dropna(): Drops rows with any missing values.

df.fillna(value): Fills missing values with a specified value.

df.fillna(method='ffill'): Forward-fills missing values.

Data Manipulation

Adding/Modifying Columns

df['new_col'] = df['col1'] * 2

Applying Functions

df['col1'].apply(lambda x: x * 10)

Grouping (Group By)

df.groupby('category_col')['value_col'].mean()

Merging/Joining

pd.merge(df1, df2, on='key', how='inner')

Concatenation

pd.concat([df1, df2], axis=0)

Input / Output

CSV

df = pd.read_csv('file.csv')
df.to_csv('output.csv', index=False)

Excel

df = pd.read_excel('file.xlsx')
df.to_excel('output.xlsx', index=False)