Interactive NumPy & Pandas Guide
Welcome to your interactive guide for NumPy and Pandas! This tool transforms the standard cheat sheet into a dynamic and easy-to-navigate resource. Use the menu on the left to jump directly to the topic you need, from creating NumPy arrays to manipulating Pandas DataFrames. Each code block includes a handy "Copy" button to streamline your workflow.
Select a topic from the navigation panel to get started.
NumPy (Numerical Python)
NumPy is the fundamental package for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. This section covers the essentials of creating, inspecting, and manipulating NumPy arrays.
Array Creation
From a list
arr = np.array([1, 2, 3, 4, 5]) # Creates a 1-dimensional NumPy array from a Python list. # Output: [1 2 3 4 5]
Zeros / Ones / Empty
zeros_arr = np.zeros((3, 4)) # Output: # [[0. 0. 0. 0.] # [0. 0. 0. 0.] # [0. 0. 0. 0.]] ones_arr = np.ones((2, 2)) # Output: # [[1. 1.] # [1. 1.]] empty_arr = np.empty((2, 3)) # Creates an array with uninitialized data.
Range / Linspace
range_arr = np.arange(0, 10, 2) # Creates an array with values from 0 up to (but not including) 10, with a step of 2. # Output: [0 2 4 6 8] lin_arr = np.linspace(0, 1, 5) # Creates an array with 5 evenly spaced numbers from 0 to 1 (inclusive). # Output: [0. 0.25 0.5 0.75 1. ]
Random
rand_int = np.random.randint(0, 10, size=(2, 3)) # Creates a 2x3 array of random integers between 0 and 9. # Output: (will vary) rand_float = np.random.rand(2, 2) # Creates a 2x2 array of random floats between 0.0 and 1.0. # Output: (will vary)
Array Attributes
arr.shape
: Returns a tuple with the dimensions of the array.
arr.ndim
: Returns the number of dimensions.
arr.size
: Returns the total number of elements.
arr.dtype
: Returns the data type of the elements.
Indexing & Slicing
1D Array
arr = np.array([10, 20, 30, 40, 50]) arr[0] # Output: 10 arr[1:4] # Output: [20 30 40] arr[-1] # Output: 50
2D Array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) arr_2d[0, 1] # Output: 2 (row 0, col 1) arr_2d[:, 1] # Output: [2 5 8] (all rows, col 1) arr_2d[1:, :2] # Output: [[4 5], [7 8]]
Boolean Indexing
arr = np.array([1, 2, 3, 4, 5]) arr[arr > 3] # Selects elements where the condition is True. # Output: [4 5]
Basic Operations
Arithmetic
a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) a + b # Output: [5 7 9] a * 2 # Output: [2 4 6] a * b # Output: [ 4 10 18]
Aggregation
arr = np.array([[1, 2], [3, 4]]) arr.sum() # Output: 10 arr.sum(axis=0) # Output: [4 6] (sum along columns) arr.mean() # Output: 2.5 arr.max(axis=1) # Output: [2 4] (max along rows)
Reshaping & Concatenation
arr = np.arange(1, 7) arr.reshape(2, 3) # Reshapes to a 2x3 array. # Output: # [[1 2 3] # [4 5 6]] a = np.array([1, 2]) b = np.array([3, 4]) np.concatenate((a, b)) # Joins arrays. # Output: [1 2 3 4]
Pandas
Pandas is a powerful library for data manipulation and analysis. It introduces two primary data structures: the Series (1D) and the DataFrame (2D). Pandas is indispensable for tasks like cleaning messy data, merging datasets, and performing complex selections and aggregations. This section covers the core functionalities you'll use daily.
Data Structures
Series
s = pd.Series([1, 3, 5, np.nan, 6, 8]) # A one-dimensional labeled array. # Output: # 0 1.0 # 1 3.0 # 2 5.0 # 3 NaN # 4 6.0 # 5 8.0 # dtype: float64
DataFrame
data = {'col1': [1, 2, 3], 'col2': ['A', 'B', 'C']} df = pd.DataFrame(data) # A two-dimensional labeled data structure. # Output: # col1 col2 # 0 1 A # 1 2 B # 2 3 C
Viewing Data
df.head()
: View the first 5 rows.
df.tail()
: View the last 5 rows.
df.info()
: Get a concise summary of the DataFrame.
df.describe()
: Generate descriptive statistics for numerical columns.
df.shape
: Get the (rows, columns) tuple.
df.columns
: Get the column labels.
Selection
Column Selection
df['col1'] # Selects a single column (returns a Series). df[['col1', 'col2']] # Selects multiple columns (returns a DataFrame).
Row Selection (by label - `loc`)
df.loc[0] # Selects row with label 0. df.loc[0:2, 'col1'] # Selects rows 0-2 (inclusive) for 'col1'.
Row Selection (by position - `iloc`)
df.iloc[0] # Selects row at position 0. df.iloc[0:2, 0] # Selects rows 0-1 (exclusive) for column at position 0.
Conditional Selection
df[df['col1'] > 2] df[(df['col1'] > 1) & (df['col2'] == 'B')]
Handling Missing Data
df.isnull()
: Returns a boolean DataFrame indicating missing values.
df.dropna()
: Drops rows with any missing values.
df.fillna(value)
: Fills missing values with a specified value.
df.fillna(method='ffill')
: Forward-fills missing values.
Data Manipulation
Adding/Modifying Columns
df['new_col'] = df['col1'] * 2
Applying Functions
df['col1'].apply(lambda x: x * 10)
Grouping (Group By)
df.groupby('category_col')['value_col'].mean()
Merging/Joining
pd.merge(df1, df2, on='key', how='inner')
Concatenation
pd.concat([df1, df2], axis=0)
Input / Output
CSV
df = pd.read_csv('file.csv') df.to_csv('output.csv', index=False)
Excel
df = pd.read_excel('file.xlsx') df.to_excel('output.xlsx', index=False)