Tag Archive for: Columns


One day, Person X asked Person Y, “How do you get the values present in the data frame column in R language?” So, Person Y answered, “There are many ways to extract columns from the data frame.” So, he requested Person X to check this tutorial.

There are many ways to extract columns from the data frame. In this article, we will discuss two scenarios with their corresponding methods.

Now, we will see how to extract columns from a data frame. First, let’s create a data frame.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#display the market dataframe

print(market)

Result:

You can see the market data frame here:

Let’s discuss them one by one.

Scenario 1: Extract Columns From the Data Frame by Column Name

In this scenario, we will see different methods to extract column/s from a data frame using column names. It returns the values present in the column in the form of a vector.

Method 1: $ Operator

The $ operator will be used to access the data present in a data frame column.

Syntax:

Where,

  1. The dataframe_object is the data frame.
  2. The column is the name of the column to be retrieved.

Example

In this example, we will extract market_name and market_type columns separately.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract market_name column

print(market$market_name)

#extract market_type column

print(market$market_type)

Result:

We can see that the values present in market_name and market_type were returned.

Method 2: Specifying Column Names in a Vector

Here, we are specifying column names to be extracted inside a vector.

Syntax:

dataframe_object[,c(column,….)]

Where,

  1. The dataframe_object is the data frame.
  2. The column is the name of the column/s to be retrieved.

Example

In this example, we will extract “market_id”, “market_squarefeet”, and “market_place” columns at a time.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns – «market_id»,»market_squarefeet» and «market_place»

print(market[ , c(«market_id», «market_squarefeet»,«market_place»)])

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Method 3: subset() With select()

In this case, we are using subset() with a select parameter to extract column names from the data frame. It takes two parameters. The first parameter is the data frame object, and the second parameter is the select() method. The column names through a vector are assigned to this method.

Syntax:

subset(dataframe_object,select=c(column,….))

Parameters:

  1. The dataframe_object is the data frame.
  2. The column is the name of the column/s to be retrieved via the select() method.

Example

In this example, we will extract “market_id”,”market_squarefeet” and “market_place” columns at a time using subset() with select parameter.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns -«market_id»,»market_squarefeet» and «market_place»

print(subset(market,select= c(«market_id», «market_squarefeet»,«market_place»)) )

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Method 4: select()

The select() method takes column names to be extracted from the data frame and loaded into the dataframe object using the “%>%” operator. The select() method is available in the dplyr library. Therefore, we need to use this library.

Syntax:

dataframe_object %>% select(column,….))

Parameters:

  1. The dataframe_object is the data frame.
  2. The column is the name of the column/s to be retrieved.

Example

In this example, we will extract “market_id”,”market_squarefeet”, and “market_place” columns at a time using the select() method.

library(«dplyr»)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns – «market_id»,»market_squarefeet», and «market_place»

print(market %>% select(«market_id», «market_squarefeet»,«market_place»))

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Scenario 2: Extract Columns From Data Frame by Column Indices

In this scenario, we will see different methods to extract column/s from a data frame using column index. It returns the values present in the column in the form of a vector. Index starts with 1.

Method 1: Specifying Column Indices in a Vector

Here, we are specifying column indices to be extracted inside a vector.

Syntax:

dataframe_object[,c(index,….)]

Where,

        1. The dataframe_object is the data frame.
        2. The index represents the column/s position to be retrieved.

Example

In this example, we will extract “market_id”,”market_squarefeet”, and “market_place” columns at a time.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns – «market_id»,»market_squarefeet» and «market_place» using column indices

print(market[ , c(1,5,3)])

Result:

We can see that the columns – “market_id”,”market_squarefeet” and “market_place” were returned.

Method 2: subset() With select()

In this case, we are using subset() with select parameters to extract columns from the data frame with column indices. It takes two parameters. The first parameter is the dataframe object and the second parameter is the select() method. The column indices through a vector are assigned to this method.

Syntax:

subset(dataframe_object,select=c(index,….))

Parameters:

  1. The dataframe_object is the data frame.
  2. The index represents the column/s position to be retrieved.

Example

In this example, we will extract “market_id”, “market_squarefeet”, and “market_place” columns at a time using the subset() method with select parameter.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns – #extract columns – «market_id»,»market_squarefeet» and «market_place» using column indices

print(subset(market,select= c(1,5,3)) )

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Method 3: select()

The select() method takes the column indices to be extracted from the data frame and loaded into the data frame object using the “%>%” operator. The select() method is available in the dplyr library. Therefore, we need to use this library.

Syntax:

dataframe_object %>% select(index,….))

Parameters:

  1. The dataframe_object is the data frame.
  2. The index represents the column/s position to be retrieved.

Example

In this example, we will extract “market_id”,”market_squarefeet”, and “market_place” columns at a time using the select() method.

library(«dplyr»)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns – #extract columns – «market_id»,»market_squarefeet» and «market_place» using column indices

print(market %>% select(1,5,3))

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Conclusion

This article discussed how we could extract the columns through column names and column indices using the select() and subset() methods with select parameters. And if we want to extract a single column, simply use the “$” operator.



Source link


Consider a requirement that you need to reorder the columns in an R data frame. How can you do that? Go through this article to get the solution for the given requirement.

Now, we will see how to reorder the columns in the data frame. First, let’s create a data frame.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#display the market dataframe

print(market)

Result:

You can see the market data frame here:

Let’s discuss them one by one.

Method 1: select() With Column Names

The select() method available in the dplyr library is used to select the columns provided in the order inside this method.

It takes two parameters. The first parameter represents the DataFrame object, and the second parameter represents the column names.

Syntax:

select(dataframe_object,column,…………)

Parameters:

  1. The dataframe_object is the data frame.
  2. The column represents the column names in which the data frame is ordered based on these columns.

Example

In this example, we will reorder the columns in the market-dataframe: market_name, market_place, market_squarefeet, and market_id,market_type.

library(dplyr)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe – market_name,market_place,market_squarefeet, market_id and market_type

print(select(market,market_name,market_place,market_squarefeet, market_id,market_type))

Result:

From the previous result, we can see that the data frame is returned with respect to the columns provided.

Method 2: select() With Column Indices

The select() method available in the dplyr library is used to select the columns provided in the order inside this method.

It takes two parameters. The first parameter represents the DataFrame object, and the second parameter represents the column indices.

Syntax:

select(dataframe_object,column,…………)

Parameters:

  1. The dataframe_object is the data frame.
  2. The column represents the column indices in which the data frame is ordered based on these columns.

Example

In this example, we will reorder the columns in the market-dataframe: 2, 3, 5, 1, and 4.

library(dplyr)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe – market_name,market_place,market_squarefeet, market_id and market_type

print(select(market,2,3,5,1,4))

Result:

From the previous result, we can see that the data frame is returned with respect to the column indices provided.

Method 3: select() With order()

The select() method takes the order() method as a parameter to reorder the data frame in ascending or descending order. It takes two parameters. The first parameter takes the order() method and the second parameter decreases, which takes Boolean values. FALSE specifies reordering the data frame based on the column names in ascending order, and TRUE specifies reordering the data frame based on the column names in descending order. Finally, the select() method will load this into the DataFrame object using the %>% operator.

Syntax:

dataframe_object %>% select(order(colnames(dataframe_object ),decreasing))

Parameters:

  1. The colnames(dataframe_object) return the columns and load into order() method.
  2. Decreasing is used to reorder the data frame in ascending or descending order.

Example 1

In this example, we will reorder the columns in the market-dataframe in ascending order.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe alphabetically in ascending order

print(market %>% select(order(colnames(market),decreasing = FALSE)))

Result:

From the previous result, we can see that the data frame is reordered with respect to the column names in ascending order.

Example 2

In this example, we will reorder the columns in the market-dataframe by descending order.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe alphabetically in descending order

print(market %>% select(order(colnames(market),decreasing = TRUE)))

Result:

From the previous result, we can see that the data frame is reordered with respect to the column names in descending order.

Method 4: arrange()

The arrange() method in the dplyr library is used to arrange the data frame based on the column in ascending order. It will load the arranged data frame into the data frame using the %>% operator. It is also possible to arrange the data frame in descending order by specifying the desc() method.

Based on the values in a specified column, it will reorder the columns.

Syntax for ascending order:

dataframe_object %>% arrange(column)

Syntax for descending order:

dataframe_object %>% arrange(desc(column))

Parameter:

It takes only one parameter, i.e., a column in which the remaining columns are reordered based on these column values.

Example 1

In this example, we will reorder the columns in the data frame based on market_place column values in ascending order.

library(dplyr)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe in ascending order based on market_place

print(market %>% arrange(market_place))

Result:

Here, the remaining columns are reordered based on market_place column values in ascending order.

Example 2

In this example, we will reorder the columns in the data frame based on market_place column values in descending order.

library(dplyr)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe in descending order based on market_place

print(market %>% arrange(desc(market_place)))

Result:

We can see remaining columns are reordered based on market_place column values in descending order.

Method 5: arrange_all()

The arrangeall() method in the dplyr library is used to arrange the data frame based on column names in ascending order.

Syntax:

arrange_all(dataframe_object)

Parameter:

It takes only one parameter, i.e., the DataFrame object.

Example

In this example, we will reorder the columns in the data frame using the arrange_all() method.

library(dplyr)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe alphabetically

print(arrange_all(market))

Result:

We can see that the data frame is reordered based on column names in ascending order.

Conclusion

In the article, we have seen five different methods to reorder the columns in the data frame. The select() method is used to reorder the data frame columns using column names and column indices. Next, we used order() with select(), and we saw how to reorder the columns based on the column values in both increasing and decreasing order using the arrange() method. Finally, we used arrangeall() to reorder the columns in the data frame based on column names alphabetically.



Source link


We will be discussing Pandas in Python, an open-source library that delivers high-performance data structures and data analysis tools that are ready to use. We will also learn about the DataFrame, the advantages of Pandas, and how you can use Pandas to select multiple columns of a DataFrame . Let’s get started!

What is Pandas in Python?

Pandas is a Python open-source library. It delivers efficient structures and tools for data analysis that are ready to use. Pandas is a Python module that operates on top of NumPy and is widely used for data science and analytics. NumPy is another set of low-level data structures that can handle multi-dimensional arrays and a variety of mathematical array operations. Pandas have a more advanced user interface. It also has robust time-series capability and efficient tabular data alignment. Pandas’ primary data structure is the DataFrame. A 2-D data structure allows us to store and modify tabular data. Pandas provide any functionality to the DataFrame like data manipulation, concatenation, merging, grouping, etc.

What is a DataFrame?

The most essential and extensively used data structure is the DataFrame. It is a common method of data storage. DataFrame stores data in rows and columns, just like an SQL table or a spreadsheet database.

Advantages of Pandas

Many users wish that the SQL have included capabilities like the Gaussian random number generation or quantiles because they struggle to incorporate a procedural notion into an SQL query. Users may say, “If only I could write this in Python and switch back to SQL quickly,” and Pandas provides a tabular data type with well-designed interfaces that allow them to do exactly that. There are more verbose options, such as utilizing a specific procedural language like the Oracle’s PLSQL or Postgres’ PLPGSQL or a low-level database interface. Pandas have a one-liner SQL read interface (pd.read sql) and a one-liner SQL write interface (pd.to sql), comparable to R data frames.

Another significant advantage is that the charting libraries such as Seaborn may treat the data frame columns as high-level graph attributes. So, Pandas provide a reasonable way of managing the tabular data in Python and some very wonderful storage and charting APIs.

Option 1: Using the Basic Key Index

1
2
3
4
5
6
7
8
9
10

import pandas as pd

 

data = {‘Name’:[‘A’, ‘B’, ‘C’, ‘D’],
        ‘Age’:[27, 24, 22, 32]}
 
df = pd.DataFrame(data)
 
df[[‘Name’, ‘Age’]]

 

Output:

1
2
3
4
5
6
7
8
9

    Name     Age

0    A             27

1    B             24

2    C             22

3    D             32

Option 2: Using .loc[]

1
2
3
4
5
6
7
8
9
10
11
12
13
14

import pandas as pd

 

data = {‘Fruit’:[‘Apple’, ‘Banana’, ‘Grapes’, ‘Orange’],
        ‘Price’:[160, 100, 60, 80]}

 

df = pd.DataFrame(data)

 

df.loc[0:2, [‘Fruit’, ‘Price’]]

 

Output:

1
2
3
4
5
6
7
8
9

    Fruit    Price

0  Apple     160

1  Banano    100

2  Grapes    60

3  Orange    80

Option 3: Using .iloc[]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

import pandas as pd

 

 

data = {‘Dog’:[‘A’, ‘B’, ‘C’, ‘D’],
        ‘Age’:[2, 4, 3, 1]}

 

 

df = pd.DataFrame(data)

 

df.iloc[:, 0:2]

 

 

Output:

1
2
3
4
5
6
7
8
9

    Dog   Age

0    A     2

1    B     4

2    C     3

3    D     1

Options 4: Using .ix[]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

import pandas as pd

 

 

data = {‘Name’:[‘A’, ‘B’, ‘C’, ‘D’],
        ‘Roll number’:[21, 25, 19, 49]}

 

 

df = pd.DataFrame(data)

 

print(df.ix[:, 0:2])

Output:

1
2
3
4
5
6
7
8
9

    Name   Roll number

0   A       21

1   B       25

2   C       19

3   D       49

Conclusion

We discussed about Pandas in Python, the DataFrame, the advantages of Pandas, and how to use Pandas to select multiple columns of a DataFrame. There are four options that we discussed in selecting multiple columns: using the basic key indexing, “.ix”, “.loc”, and “.iloc”, respectively.



Source link