In AWS, an IAM Role is an AWS identity like an IAM user. AWS IAM service is a very intricate service which, if not configured wisely, can lead to potential security issues. They are attached with policies which decide what this identity is allowed to do and not allowed to do. It is not attached to a single person, but can be assumed by anyone who requires it. Instead of long term credentials (password or access keys) like an IAM user, an IAM role has temporary security credentials. When a user, application, or a service needs access to AWS resources for which they do not hold permissions, they use/assume a specific role for this purpose. Temporary security credentials are then used for this task.

What Will We Cover?

In this guide, we will see how to use the “IAM Passrole” permission. As a specific example, we will see how to connect an EC2 instance with the S3 bucket using the passrole permission.

Important Terms and Concepts

AWS service role: It is a role assumed by a service so that it can perform the tasks on behalf of the user or account holder.

AWS service role for an EC2 instance: It is a role assumed by an application running on an Amazon EC2 instance to perform the tasks in the user account that are allowed by this role.

AWS service-linked role: It is a role that is predefined and directly attached to an AWS service, like the RDS service-linked role for launching a RDS DB.

Using the Passrole Permission to Connect an EC2 Instance with S3

Many AWS services need a role for configuration and this role is passed/administered to them by the user. In this way, services assume/take the role and perform the tasks on behalf of the user. For most services, the role needs to be passed merienda while configuring that service. A user requires permissions for passing a role to an AWS service. This is a good thing from a security point of view since the administrators can control which users can pass a role to a service. The “PassRole” permission is granted by a user to its IAM user, role, or group for passing a role to an AWS service.

To elaborate the previous concept, consider a case when an application running on an EC2 instance requires an access to the S3 bucket. For this, we can attach an IAM role with this instance so that this application gets the S3 permission defined in the role. This application will need the temporary credentials for authentication and authorization purposes. EC2 gets temporary security credentials when a role associates with the instance running our application. These credentials are then made available to our application to access S3.

To grant an IAM user the capability to pass a role to the EC2 service at the time of launching an instance, we need three things:

  1. An IAM permissions policy for the role that decides the scope of the role.
  2. A trust policy attached to the role which allows the EC2 to assume the role and use the permissions defined inside the role.
  3. An IAM permission policy for the IAM user that lists the roles which it can pass.

Let’s do it in a more pragmatic way. We have an IAM user with a limited permission. We then attach an inline policy to launch the EC2 instances and permission to pass an IAM role to a service. Then, we create a Role for S3 access; let’s call it “S3Access”. And attach an IAM policy to it. In this role, we only allow the reading of the S3 data using the AWS managed “AmazonS3ReadOnlyAccess” policy.

Steps to Create the Role

Step 1. From the IAM console of the administrator (root), click on “Role” and then select “Create role”.

Step 2. From the “Select trusted entity” page, select “AWS service” under the “Trusted entity type”.

Step 3. Under the “Use case”, select the radiodifusión button corresponding to the “EC2” for the “Use cases for other AWS services”:

Step 4. On next page, assign an “AmazonS3ReadOnlyAccess” policy:

Step 5. Give a name to your role (“S3Access” in our case). Add a description for this role. The following trust policy is automatically created with this role:

{
    «Version»: «2012-10-17»,
    «Statement»: [
        {
            «Effect»: «Allow»,
            «Action»: [
                «sts:AssumeRole»
            ],
            «Principal»: {
                «Service»: [
                    «ec2.amazonaws.com»
                ]
            }
        }
    ]
}

Step 6. Click on “Create role” to create the role:

IAM Policy for User

This policy gives the IAM user full EC2 permissions and permission to associate the “S3Access” role with the instance.

Step 1. From the IAM console, click on Policies and then on “Create policies”.

Step 2. On the new page, select the json tab and paste the following code:

{
   «Version»: «2012-10-17»,
   «Statement»: [{
    «Effect»:«Allow»,
    «Action»:[«ec2:*»],
    «Resource»:«*»
    },
    {
    «Effect»:«Allow»,
    «Action»:«iam:PassRole»,
    «Resource»:«arn:aws:iam::Account_ID:role/S3Access»
    }]
}

Replace the bolded text “Account_ID” with the user Account ID.

Step 3. (Optional) Give tags for your policy.

Step 4. Put a suitable name for the policy (“IAM-User-Policy” in our case) and click the “Create policy” button and attach this policy to your IAM user.

Attaching the “S3Access” Role to the EC2 Instance

Now, we will attach this role to our instance. Select your instance from the EC2 console and go to “Action > Security > Modify IAM role”. On the new page, select the “S3Access” role from the drop down menu and save it.

Verifying the Setup

Now, we will check if our EC2 instance is able to access our S3 bucket created by the administrator. Login into the EC2 instance and install the AWS CLI application. Now, run the following command on this EC2 instance:

Again, run the previous command from the IAM account configured on your regional machine. You will notice that the command is successfully executed on the EC2 instance but we got an “access denied” error on the regional machine:

The error is obvious because we have only granted the S3 access permission for the EC2 instance but not to the IAM user and to any other AWS service. Another important thing to note is that we did not make the bucket and its objects publicly accessible.

Conclusion

In this guide, we demonstrated how to use the PassRole permission in AWS. We successfully managed to connect the EC2 to S3. It is a very important concept if you care about granting the least privileges to your IAM users.



Source link


This post originally appeared on the Academy Software Foundation’s (ASWF) blog. The ASWF works to increase the quality and quantity of contributions to the content creation industry’s open source software saco. 

Tell us a bit about yourself – how did you get your start in visual effects and/or animation? What was your major in college?

I started experimenting with the BASIC programming language when I was 12 years old on a ZX81Neville Spiteri Sinclair home computer, playing a game called “Falta Lander” which ran on 1K of RAM, and took about 5 minutes to load from cassette tape.

I have a Bachelor’s degree in Cognitive Science and Computer Science.

My first job out of college was a Graphics Engineer at Wavefront Technologies, working on the precursor to Maya 1.0 3D animation system, still used today. Then I took a Digital Artist role at Digital Domain.

What is your current role?

Co-Founder / CEO at Wevr. I’m currently focused on Wevr Virtual Studio – a cloud platform we’re developing for interactive creators and teams to more easily build their projects on game engines.

What was the first film or show you ever worked on? What was your role?

First film credit: True Lies, Digital Artist.

What has been your favorite film or show to work on and why?

TheBlu 1.0 digital ocean platform. Why? We recently celebrated TheBlu 10 year anniversary. TheBlu franchise is still alive today. At the core of TheBlu was/is a creator platform enabling 3D interactive artists/developers around the world to co-create the 3D species and habitats in TheBlu. The app itself was a mostly decentralized peer-to-peer simulation that ran on distributed computers with fish swimming across the Internet. The core tenets of TheBlu 1.0 are still core to me and Wevr today, as we participate more and more in the evolving Metaverse.

How did you first learn about open source software?

Linux and Python were my best friends in 2000.

What do you like about open source software? What do you dislike?

Likes: Transparent, voluntary collaboration.

Dislikes: Nothing.

What is your vision for the Open Source community and the Academy Software Foundation?

Drive international awareness of the Foundation and OSS projects.

Where do you hope to see the Foundation in 5 years?

A integral leader in best practices for real-time engine-based production through international training and education.

What do you like to do in your free time?

Read books, listen to podcasts, watch documentaries, meditation, swimming, and efoiling!

Follow Neville on Twitter and connect on LinkedIn.  





Source link


The OpenGEH Project is one of the many projects at LF Energy. We want to share about it here on the LF blog. This originally appeared on the LF Energy site

OpenGEH ( GEH stands for Green Energy Hub ) enables fast, flexible settlement and hourly measurements of production and consumption of electricity. OpenGEH seeks to help utilities to onboard increased levels of renewables by reducing the administrative barriers of market-based coordination. By utilizing a modern DataHub, built on a modular and microservices architecture, OpenGEH is able to store billions of data points covering the entire workflow triggered by the production and consumption of electricity.

The ambition of OpenGEH is to use digitalization as a way to accelerate a market-driven transition towards a sustainable and efficient energy system. The platform provides a modern foundation for both new market participants and facilitates new business models through digital partnerships. The goal is to create access to relevant data and insights from the energy market and thereby accelerate the Energy Transition.

Initially built in partnership with Microsoft, Energinet (the Danish TSO) was seeking a critical leverage point to accelerate the Danish national commitment to 100% renewable energy in their electricity system by 2030. For most utilities, getting renewables onboard creates a technical challenge that also has choreography and administrative hurdles. Data becomes the mechanism that enables market coordination leading to increased decarbonization. The software was contributed to the LF Energy Foundation by Energinet.

Energinet sees open source and shared development as an opportunity to reduce the cost of software, while simultaneously increasing the quality and pace of development. It is an approach that they see gaining prominence in TSO cooperation. Energinet is not an IT company, and therefore does not sell systems, services, or operate other TSOs. Open source coupled with an intellectual property license that encourages collaboration, will insure that OpenGEH continues to improve, by encouraging a community of developers to add new features and functionality.



Source link


Alan Shimel 00:06
Hey, everyone back here live in Austin at the Linux Foundation Open Source Summit. You know, we’ve had a very security-heavy lineup this past week. And for good reason, security is top of mind to everyone. The OpenSSF. Of course, Monday was OpenSSF day, but it’s more than that. More than Monday, we really talked a lot about software supply chains and SBOMs and just securing open source software. My next guest is CGrove or CRbn? No, no, you know, I had CRob in my mind, and that’s what messed me up. Let’s go back to Crob. Excuse me. Now check this out a little thing myself. So Crob was actually the emcee of OpenSSF day on Monday.

CRob 01:01
I had an amazing hat. You did. And you didn’t wear it here. I came from outside with tacos, and it was all sweaty.

Alan Shimel 01:08
We just have two bald guys here. Anyway,

CRob 01:14
safety in numbers.

Alan Shimel 01:15
Well, yeah, that’s true. It’s true. Wear the hat next time. But anyway, first of all, welcome, man. Thank you.

CRob 01:21
It’s wonderful to be here. I’m excited to have this little chat.

Alan Shimel 01:24
We are excited to have you on here. So before we jump into Monday, and OpenSSF day, in that whole thing, you’re with Intel, full disclosure, what do you do in your day job.

CRob 01:36
So my day job, I am the Director of Security Communications. So primarily our function is as incidents happen, so there’s a new vulnerability discovered, or researchers find some report on our portfolio, I help kind of evaluate that and kind of determine how we’re going to communicate it.

Alan Shimel 01:56
Love it, and your role within OpenSSF?

CRob 02:01
So I’ve been with the OpenSSF for over two years, almost from the beginning. And currently I am the working group lead for the developer best practices working group and the vulnerability disclosures working group. I sit on the technical advisory committee, so we help kind of shape, steer the strategy for the foundation. I’m on the Public Policy and Government Affairs Committee. And I’m just now the owner of two brand new SIGs, special interest groups underneath the working group. So I’m in charge of the education SIG, and the open source cert SIG. So we’re going to create a PSIRT for open source.

Alan Shimel 02:38
That’s beautiful man. That is really and let’s talk about that SIRT. Yeah, it’ll be through Linux Foundation.

Unknown Speaker 02:47
Yeah, we are still. So back in May the foundation and some contributors created the mobilization plan. I’m sure people have talked about it this week. 10 point plan addressing trying to help respond to things like the White House executive order. And it’s a plan that says these 10 different work streams we feel we can improve the security posture of open source software. And the open source SIRT was stream five. And the idea is to try to find a collection of experts from around the industry that understand how to do incident response, and also understand how to get things fixed within open source communities.

CRob 03:27
So we’re we have our first meeting for the SIG the first week of July. And we’re going to try to refine the initial plan and kind of spec it out and see how we want to react. But I think ultimately, it’s going to be kind of a mentorship program for upstream communities to teach them how to do incident response. We know and help them work with security researchers and reporters, and also help make sure that they’ve got tools and processes in place so they can be successful.

Alan Shimel 03:56
I love it. Yeah. Let’s be honest, this is a piece of work you cut out for yourself.

Unknown Speaker 04:04
Yes, one of my other groups I work with is a group called First, the Form of Incident Response and Security Teams. And I’m one of the authors of the PSIRT services framework. So I have a little help. So I understand that you got a vendor back on that, right? Yeah, we’re gonna lean into that as kind of a model to start with, and kind of see what we need to change to make it work for open source communities.

Alan Shimel 04:27
I actually love that good thing. When do you think we might see something on this? No pressure.

Unknown Speaker 04:32
No pressure? Oh, definitely. The meetings will be public. So all of that will go up into YouTube. So you’ll be able to observe kind of the progress of the group. I expect we’re going to take probably at least a month to refine the current plan and submit a proposal back to the governing board. We think this is actionable. So hopefully before the end of the year, maybe late fall, we’ll actually be able to start taking action.

Alan Shimel 04:57
All right. Love it. Love it. Gotta ask you, Where does the name come from?

Unknown Speaker 05:03
So the name comes from Novell GroupWise. So back in the day, our network was run by an HP VAX. But our email system plugged into the VAX and you were limited by the characters of your name. So my name Chris Robinson. So his first little first letter, first name, next seven of your last, so I ended up being Crobinsoe. And we hired a developer that walked in, he looked at it, and he’s like, ah, Crobinso the chromosome, right? Got shortened to Crob.

Alan Shimel 05:36
Okay, not very cool. So thank you. Not Crob. That’s right. Thank you Novell is right. That was very interesting days. Remember.

Unknown Speaker 05:45
I love that stuff. I was Novell engineer for many years.

Alan Shimel 05:49
That’s when certs really meant something certified Novell. You are? Yeah. Where are they now? See, I think the last time I was out in Utah. Now I was I think it was 2005. I was out in Utah, they would do if there was something they were working on.

Unknown Speaker 06:14
They bought SUSE. And we thought that that would be pretty amazing to kind of incorporate this Novell had some amazing tools. Absolutely. So we thought that would be really awesome than the NDS was the best. But we were hoping that through SUSE they be able to channel these tools and get broader adoption.

Alan Shimel 06:30
No, I think for whatever reason. There’s a lot of companies from back in those days, right, that we think about, indeed, Yeah. Anyway,

Unknown Speaker 06:45
My other working group. So we have more, but wait, there’s more, we have more. So the developer best practices working group is spinning off and education sake. So a lot of the conference this week is talking about how we need to get more training and certification and education into the hands of developers. So again, we’ve created another kind of Tiger team, are we’re gonna be focusing on this. And my friend, Dr. David Wheeler, David A. Wheeler, he had a big announcement where we have existing body of material, the secure coding fundamentals class, and he was able to transform that into SCORM. So now that anybody who has a SCORM learning management system has the ability to leverage this free developer secure software training, really, yes.

Alan Shimel 07:35
And that’s the SCORM. system. If you have SCORM, you can leverage this.

Unknown Speaker 07:39
free, there’s some rules behind it. But yeah, absolutely. It’s plugged in, we’re looking to get that donated to higher education, historically black colleges and universities (HBCU), trade schools like DeVry, wherever

Alan Shimel 07:52
Get it into people’s hands. That’s the thing to do. So that get that kind of stuff gets me really excited. I’ll be honest with you, you know, all too often, we’re good in the tech industry for forming a foundation and, and a SIG and an advisory board. But rubber meets the road, when you can teach people coming up. Right, so they come in with the right habits, because you know, it’s harder to teach the old dogs, the new tricks, right.

CRob 08:23
I can’t take the class. I know the brains full.

Alan Shimel 08:26
Yeah, no, I hear you. But no, but not only that, look, if you’ve been developing software for 25 years, and I’m gonna come and tell you, Well, what you doing is wrong. And I need you to start doing it this way. Now, I’m gonna make some progress. Because no one wants to say I know everything. And I’m not changing. People don’t just say that. But it’s just almost subconsciously, it’s a lot harder.

Unknown Speaker 08:51
It definitely is. And that’s kind of informing our approach. So we have a traditional, about 20 hours worth of traditional class material. So we’re looking at how we can transform that material into things like webinars and podcasts, and maybe a boot camp. So maybe next year, at the Open Source Summit, we might be able to offer a training class where you walk in, take the class, and walk out with a certification.

CRob 09:17
And then thinking about, you know, we have a lot of different learners. We have, you know, brand new students, we have people in the middle of their careers, people are making career changes. So we have to kind of serve all these different constituents. And that’s absolutely true. And that is one of the problems. Kind of the user journeys we’re trying to fulfill is this. I’m an existing developer, how do I gain new skills or refine what I have?

Alan Shimel 09:40
Let me ask you a question. So, I come from the security side of that. Nothing the matter with putting the emphasis on developers developing more secure software. But shouldn’t we also be developing for security people to better secure open source software.

CRob 10:02
And the foundation itself does have many, it’s multipronged. And so to help like a practitioner, we have things like our scorecard and all stars. And then we have a project criticality score. And actually, we just I, there was a great session just a couple hours ago, by one of my peers, Jacque Chester, and it was kind of a, if you’re a risk guy, it was kind of based off of Open Fair, which is a risk management methodology, kind of explaining how we can evaluate open source projects, share that information with downstream consumers and risk management teams or procurement teams, and kind of give them a quantitative assessment of this is what risks you could incur by these projects.

CRob 10:44
So if you have two projects that do the same thing, one might have a higher or lower score will provide you the data that you could make your own assessment off of that and make your own judgment. So that the foundation is also looking at just many different avenues to get this out there, focused on practitioners and developers, and hopefully by this kind of hydraulic approach, it will be successful. It’ll stick.

Alan Shimel 11:07
you know what you just put as much stuff on the wall and whatever sticks sticks man up. So anyway, hey Crob. Right. I got it right. Yep. All right. Thank you for stopping by. So thank you for all you do, right. I mean, it’s a community thing. These are not paid type of gigs, right. Sure. Yeah. No, and I thank you for your for your time and efforts on that.

CRob 11:30
Thank you very much. All right.

Alan Shimel 11:31
Hey, keep up the great work. We’re gonna take a break. I think we’ve got another interview coming up in a moment. And we’re here live in Austin.



Source link


“Visual representations of data include graphs and charts. Your goal as a data scientist is to make perfect sense of vast amounts of information. Three procedures are involved in data analysis. Obtaining Data, cleaning, and altering the data is an important part of the process. To further evaluate the data, construct a visual display from it. Data visualizations with the plot are tremendous tools for making complicated analysis easier to understand. But first, let’s go through some fundamental plotting principles like scatter plots. A scatterplot is a diagram that presents the levels of two numerical variables in a set of data as geometrical points within a Cartesian diagram.”

What is the Scatter Plot in the R Programming Language in Ubuntu 20.04?

Comparing variables is done using scatter plots. When we need to know what further one variable is influenced by another, we need to compare the two variables. The scatter plot is a group of dotted points on the x and y axes that represent distinct pieces of data. The layout of the generated points demonstrates a correlation between two variables when their values are displayed along the X-axis and Y-axis.

Syntax of the Scatter Plot in the R Programming Language in Ubuntu 20.04

In R, you can make a scatterplot in a variety of methods. Plot(x, y), in which x and y parameters are numerical vectors specifying the (x,y) positions to the plot, is the most basic function.

plot(x, y, main, xlab, ylab, xlim, ylim, axes)

As mentioned above that, x any x parameters are mandatory to graph the scatter plot but the scatter plot also supports some optional parameters, which are described as follows:

x: The horizontal coordinates are set with this option.

y: The erecto coordinates are set with this option.

xlab: The label for the horizontal axis.

ylab: The erecto axis label.

main: The topic of the chart is defined by the parameter main.

xlim: The xlim parameter is used to depict x values.

ylim: Th ylim option is used to plot values of y.

axes: This option determines whether the plot should include both axes.

How to Construct the Scatter Plot in the R in Ubuntu 20.04?

Let’s look at an example to show how we can use the plot function to create a scatterplot. We will utilize the sample dataset in our examples, which is a preconfigured dataset in the R environment.

Example # 1: Using the Plot Method for Constructing the Scatter Plot R in Ubuntu 20.04

The plot() method in the R Programming Language can be used to make a scatter plot.

To construct the scatter plot, we need the data set. So here, we have inserted the data set USArrests from the R language. We have selected the two columns from this data set for making the scatter plot. The first few entries are shown of the data set USArrests. Then, we have the plot function where the two inputs, x, and y, are set. For x input, the column “Murder” is selected, and for the y input, we have the “UrbanPop” column. Some optional inputs are passed inside the function, like labels for x and y are set with the xlab and ylab. Within the xlim and ylim range, the values of the x and y parameters are set. Also, the title of the scatter plot is set by calling the option “main.”

The output of the scatterplot is generated below.

Example # 2: Using the Pair Method for Constructing the Scatter Plot Matrices R in Ubuntu 20.04

We utilize a scatterplot matrix when we have multiple variables and want to correlate one variable with the others. Scatterplot matrices are created using the pairs() method.

Here, we have selected the sample dataset iris from the r language. Then, print the top six entries of the iris data set. To the columns of the data set iris, we have applied the pair method. Each column will be paired with the remaining column in the pair function.

The scatterplot metrics are visualized in the following figure.

Example # 3: Using Fitted Values in a Scatterplot in R in Ubuntu 20.04

You can expand the graph by adding a new level of data. In linear regression, you can visualize the fitted value. For constructing a scatterplot, we use the ggplot2 package’s ggplot() and geom_point() methods.

Begin with this example; we have imported the ggplot2 module from r. Then, we have utilized the ggplot method where the dataset name “mtcars” is given. The “aes” function is used inside the ggplot method for creating the logs for the x and y parameters. For linear regression, an additional set of features, “stat_smooth,” is used. The smoothing method is controlled by the stat_ smooth() option. The standard error (se) is kept false, and the line of the size is set to the value 1.

Example # 4: Using a Dynamic Name for the Scatter Plot Title in R in Ubuntu 20.04

We haven’t put any data on the plots yet. Informational graphs are required. Without resorting to extra documentation, the reader should be able to understand the message behind the analysis of data just by glancing at the plot. As a result, good labels are required when using plots. Labels can be added using the labs() function.

We have a variable here as scatter_graph to which the ggplot method is assigned. The ggplot set its parameter the same as the above example but for a different data set. The dataset used here is the iris. Then, we have again utilized the scatter_garph variable, and this time, we have set the dynamic names to the scatter plot.

You can see the additional information on the scatterplot inside the following figure.

Example # 5: Using the 3dscatterplot Method for Constructing the Scatter Plot in R in Ubuntu 20.04

The scatterplot3d package lets you make a three-dimensional scatterplot. Scatterplot3d is a useful technique that uses (x, y, z) syntax.

We have included the scatterplot3d module inside our r script above. Now, we can use the scatterplot3d function. To the scatterplot3d function, we have passed three parameters which are the columns selected from the dataset ToothGrowth.

The 3D scatterplot is rendered in the following graph snap.

Conclusion

This article aims to brief you about the scatter plot in R. Scatter plots are dispersion graphs that are used to display data points from parameters (usually two, but three is possible). The primary purpose of the R scatters plot is to help visualize the data and whether numeric variables have any relationship. We have seen various approaches which help us to create the scatterplot in the very easiest way. Each method has its functionality and is very easy to understand.



Source link


Infrastructure as Code or IaC is a new approach in which coding is used to set up an infrastructure. This means instead of manually setting up VMs, networks, and other components of a network, we write code that describes the infrastructure and simply run that code to get the desired state. Terraform has emerged as an outstanding tool that uses the IaC approach.

Like many other tasks, Terraform can be used to create and manage an AWS S3 bucket. Versioning means keeping several versions, or you may simply call them variants of a file. Versioning in AWS S3 can be used to maintain and restore different variants of the object stored inside it. This has many benefits. For example, we can restore accidentally deleted items.

What Will We Cover?

In this guide, we will see how to enable versioning on an S3 bucket using Terraform. We are working on the Ubuntu 20.04 system for this tutorial. Let us get started now.

What Will You Need?

  1. Basics of Terraform
  2. Access to the internet
  3. Terraform installed on your system. Check by running the Terraform -version.

Creating AWS S3 Bucket Using Terraform

Now that we have seen a little bit about Terraform and hopefully, you would have installed it on our lugar machine, we can continue our task of working with S3. As mentioned earlier, Terraform uses several configuration files for provisioning resources, and each of these files must reside in their respective working folder/directory. Let us create a directory for this purpose.

Step 1. Start by creating a folder that will contain all the configuration files, and then change your terminal directory to the following:

$ mkdir linuxhint-terraform && cd linuxhint-terraform

Step 2. Let us create our first configuration file, “variables.tf”, that will contain the information about our AWS region and the type of instance we want to use:

Now, put the following text inside it and save the file:

variable «aws_region» {

description = «The AWS region to create the S3 bucket in.»

default = «us-east-1»

}

variable “bucket_name” {

description = “A unique name for the bucket”

default = “tecofers-4

}

“tecofers-4” is the name of our bucket, and you can use your own name here.

Step 3. Make a “main.tf” file that will contain the definition for our infrastructure.

Now, put the following configuration inside it:

terraform {

required_providers {

aws = {

source = «hashicorp/aws»

version = «~> 3.27»

}

}

required_version = «>= 0.14.9»

}

provider «aws» {

region = var.aws_region

shared_credentials_file = «/home/Your_User_Name/.aws/credentials»

profile = «profile1»

}

resource «aws_s3_bucket» «bucket1» {

bucket = var.bucket_name

tags = {

Name = «ExampleS3Bucket»

}

}

resource «aws_s3_bucket_acl» «bucket1» {

bucket = var.bucket_name

acl = «private»

}

resource «aws_s3_bucket_versioning» «bucket_versioning» {

bucket = var.bucket_name

versioning_configuration {

status = «Enabled»

}

}

Change the “Your_User-Name” to the user name of your system. Let us see the parameters used in the previous files:

bucket: It is an optional parameter when specified creates a new bucket. If this argument is not present, the Terraform will give the bucket a random and unique name. The bucket’s name needs to be in lowercase, with the length not exceeding 63 characters.

Shared_credentials_file: It is the path of the file containing the credentials of the AWS users.

Profile: It specifies the user’s profile for creating the S3 bucket.

The resource “aws_s3_bucket” and “aws_s3_bucket_acl” provides a bucket and an ACL resource (acl configuration) for the bucket. The “acl” argument is optional and provides an Amazon-designed set of predefined grants.

Similarly, the resource “aws_s3_bucket_versioning” provides a resource for version control on an S3 bucket. The versioning_configuration block defined in this block contains the required configuration for this purpose. The status argument is mandatory and can contain a single value from among: Enabled, Disabled, and Suspended.

Initializing the Terraform Directory

To download and install the provider, we defined in our configuration and other files. We need to initialize the directory containing this file:

Building the Infrastructure

Now that we have prepared our configuration files, we can apply the changes using the following command:

Enter “yes” on the terminal when prompted. When the Terraform finishes its work, the following message appears:

Verifying the Procedure

Now, let us check if the desired S3 bucket is created. Head to the S3 console and check for the available buckets:

Since our bucket was created successfully, we can now upload files to it and create new folders here.

Delete the resources you created when you do not need them. This will save you from unwanted charges on AWS:

Conclusion

In this guide, we have learned about enabling versioning on an S3 bucket using Terraform. We created a bucket and applied versioning to it. There are so many things we can do using Terraform to simplify our infrastructure deployment.



Source link


“In Python, PySpark is a Spark module used to provide a similar kind of Processing like spark using DataFrame, which will store the given data in row and column format.

PySpark – pandas DataFrame represents the pandas DataFrame, but it holds the PySpark DataFrame internally.

Pandas support DataFrame data structure, and pandas are imported from the pyspark module.

Before that, you have to install the pyspark module.”

Command

Syntax to import:

from pyspark import pandas

After that, we can create or use the dataframe from the pandas module.

Syntax to create pandas DataFrame:

pyspark.pandas.DataFrame()

We can pass a dictionary or list of lists with values.

Let’s create a pandas DataFrame through pyspark that has four columns and five rows.

#import pandas from the pyspark module

from pyspark import pandas

 

#create dataframe from pandas pyspark

pyspark_pandas=pandas.DataFrame({‘student_lastname’:[‘manasa’,‘trisha’,‘lehara’,‘kapila’,‘hyna’],
‘mark1’:[90,56,78,54,67],‘mark2’:[100,67,96,89,32],‘mark3’:[91,92,98,97,87]})

 

print(pyspark_pandas)

Output:

Now, we will go into our tutorial.

There are several ways to return the top and last rows from the pyspark pandas dataframe.

Let’s see them one by one.

pyspark.pandas.DataFrame.head

head() will return top rows from the top of the pyspark pandas dataframe. It takes n as a parameter that specifies the number of rows displayed from the top. By default, it will return the top 5 rows.

Syntax:

Where pyspark_pandas is the pyspark pandas dataframe.

Parameter:

n specifies an integer value that displays the number of rows from the top of the pyspark pandas dataframe.

We can also use the head() function to display specific column.

Syntax:

pyspark_pandas.column.head(n)

Example 1

In this example, we will return the top 2 and 4 rows in the mark1 column.

#import pandas from the pyspark module

from pyspark import pandas

 

#create dataframe from pandas pyspark

pyspark_pandas=pandas.DataFrame({‘student_lastname’:[‘manasa’,‘trisha’,‘lehara’,‘kapila’,‘hyna’],‘mark1’:[90,56,78,54,67],‘mark2’:[100,67,96,89,32],‘mark3’:[91,92,98,97,87]})

 

#display top 2 rows in mark1 column

print(pyspark_pandas.mark1.head(2))

print()

#display top 4 rows in mark1 column

print(pyspark_pandas.mark1.head(4))

Output:

0 90

1 56

Name: mark1, dtype: int64

0 90

1 56

2 78

3 54

Name: mark1, dtype: int64

We can see that the top 2 and 4 rows were selected from the marks1 column.

Example 2

In this example, we will return the top 2 and 4 rows in the student_lastname column.

#import pandas from the pyspark module

from pyspark import pandas

 

#create dataframe from pandas pyspark

pyspark_pandas=pandas.DataFrame({‘student_lastname’:[‘manasa’,‘trisha’,‘lehara’,‘kapila’,‘hyna’],‘mark1’:[90,56,78,54,67],‘mark2’:[100,67,96,89,32],‘mark3’:[91,92,98,97,87]})

 

#display top 2 rows in student_lastname column

print(pyspark_pandas.student_lastname.head(2))

print()

#display top 4 rows in student_lastname column

print(pyspark_pandas.student_lastname.head(4))

Output:

0 manasa

1 trisha

Name: student_lastname, dtype: object

0 manasa

1 trisha

2 lehara

3 kapila

Name: student_lastname, dtype: object

We can see that the top 2 and 4 rows were selected from the student_lastname column.

Example 3

In this example, we will return the top 2 rows from the entire dataframe.

#import pandas from the pyspark module

from pyspark import pandas

 

#create dataframe from pandas pyspark

pyspark_pandas=pandas.DataFrame({‘student_lastname’:[‘manasa’,‘trisha’,‘lehara’,‘kapila’,‘hyna’],‘mark1’:[90,56,78,54,67],‘mark2’:[100,67,96,89,32],‘mark3’:[91,92,98,97,87]})

 

#display top 2 rows

print(pyspark_pandas.head(2))

print()

#display top 4 rows

print(pyspark_pandas.head(4))

Output:

student_lastname mark1 mark2 mark3

0 manasa 90 100 91

1 trisha 56 67 92

student_lastname mark1 mark2 mark3

0 manasa 90 100 91

1 trisha 56 67 92

2 lehara 78 96 98

3 kapila 54 89 97

We can see that the entire dataframe is returned with the top 2 and 4 rows.

pyspark.pandas.DataFrame.tail

tail() will return rows from the last in the pyspark pandas dataframe. It takes n as a parameter that specifies the number of rows displayed from the last.

Syntax:

Where pyspark_pandas is the pyspark pandas dataframe.

Parameter:

n specifies an integer value that displays the number of rows from the last of the pyspark pandas dataframe. By default, it will return the last 5 rows.

We can also use the tail() function to display specific columns.

Syntax:

pyspark_pandas.column.tail(n)

Example 1

In this example, we will return the last 2 and 4 rows in the mark1 column.

#import pandas from the pyspark module

from pyspark import pandas

 

#create dataframe from pandas pyspark

pyspark_pandas=pandas.DataFrame({‘student_lastname’:[‘manasa’,‘trisha’,‘lehara’,‘kapila’,‘hyna’],‘mark1’:[90,56,78,54,67],‘mark2’:[100,67,96,89,32],‘mark3’:[91,92,98,97,87]})

 

#display last 2 rows in mark1 column

print(pyspark_pandas.mark1.tail(2))

 

print()

 

#display last 4 rows in mark1 column

print(pyspark_pandas.mark1.tail(4))

Output:

3 54

4 67

Name: mark1, dtype: int64

1 56

2 78

3 54

4 67

Name: mark1, dtype: int64

We can see that the last 2 and 4 rows were selected from the marks1 column.

Example 2

In this example, we will return the last 2 and 4 rows in the student_lastname column.

#import pandas from the pyspark module

from pyspark import pandas

 

#create dataframe from pandas pyspark

pyspark_pandas=pandas.DataFrame({‘student_lastname’:[‘manasa’,‘trisha’,‘lehara’,‘kapila’,‘hyna’],‘mark1’:[90,56,78,54,67],‘mark2’:[100,67,96,89,32],‘mark3’:[91,92,98,97,87]})

 

#display last 2 rows in student_lastname column

print(pyspark_pandas.student_lastname.tail(2))

 

print()

 

#display last 4 rows in student_lastname column

print(pyspark_pandas.student_lastname.tail(4))

Output:

3 kapila

4 hyna

Name: student_lastname, dtype: object

1 trisha

2 lehara

3 kapila

4 hyna

Name: student_lastname, dtype: object

We can see that the last 2 and 4 rows were selected from the student_lastname column.

Example 3

In this example, we will return the last 2 rows from the entire dataframe.

#import pandas from the pyspark module

from pyspark import pandas

 

#create dataframe from pandas pyspark

pyspark_pandas=pandas.DataFrame({‘student_lastname’:[‘manasa’,‘trisha’,‘lehara’,‘kapila’,‘hyna’],‘mark1’:[90,56,78,54,67],‘mark2’:[100,67,96,89,32],‘mark3’:[91,92,98,97,87]})

 

#display last 2 rows

print(pyspark_pandas.tail(2))

 

print()

 

#display last 4 rows

print(pyspark_pandas.tail(4))

Output:

student_lastname mark1 mark2 mark3

3 kapila 54 89 97

4 hyna 67 32 87

student_lastname mark1 mark2 mark3

1 trisha 56 67 92

2 lehara 78 96 98

3 kapila 54 89 97

4 hyna 67 32 87

We can see that the entire dataframe is returned with the last 2 and 4 rows.

Conclusion

We saw how to display the top and last rows from the pyspark pandas dataframe using head() and tail() functions. By default, they return 5 rows.head(), and tail() functions are also used to get the top and last rows with specific columns.



Source link


One day, Person X asked Person Y, “How do you get the values present in the data frame column in R language?” So, Person Y answered, “There are many ways to extract columns from the data frame.” So, he requested Person X to check this tutorial.

There are many ways to extract columns from the data frame. In this article, we will discuss two scenarios with their corresponding methods.

Now, we will see how to extract columns from a data frame. First, let’s create a data frame.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#display the market dataframe

print(market)

Result:

You can see the market data frame here:

Let’s discuss them one by one.

Scenario 1: Extract Columns From the Data Frame by Column Name

In this scenario, we will see different methods to extract column/s from a data frame using column names. It returns the values present in the column in the form of a vector.

Method 1: $ Operator

The $ operator will be used to access the data present in a data frame column.

Syntax:

Where,

  1. The dataframe_object is the data frame.
  2. The column is the name of the column to be retrieved.

Example

In this example, we will extract market_name and market_type columns separately.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract market_name column

print(market$market_name)

#extract market_type column

print(market$market_type)

Result:

We can see that the values present in market_name and market_type were returned.

Method 2: Specifying Column Names in a Vector

Here, we are specifying column names to be extracted inside a vector.

Syntax:

dataframe_object[,c(column,….)]

Where,

  1. The dataframe_object is the data frame.
  2. The column is the name of the column/s to be retrieved.

Example

In this example, we will extract “market_id”, “market_squarefeet”, and “market_place” columns at a time.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns – «market_id»,»market_squarefeet» and «market_place»

print(market[ , c(«market_id», «market_squarefeet»,«market_place»)])

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Method 3: subset() With select()

In this case, we are using subset() with a select parameter to extract column names from the data frame. It takes two parameters. The first parameter is the data frame object, and the second parameter is the select() method. The column names through a vector are assigned to this method.

Syntax:

subset(dataframe_object,select=c(column,….))

Parameters:

  1. The dataframe_object is the data frame.
  2. The column is the name of the column/s to be retrieved via the select() method.

Example

In this example, we will extract “market_id”,”market_squarefeet” and “market_place” columns at a time using subset() with select parameter.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns -«market_id»,»market_squarefeet» and «market_place»

print(subset(market,select= c(«market_id», «market_squarefeet»,«market_place»)) )

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Method 4: select()

The select() method takes column names to be extracted from the data frame and loaded into the dataframe object using the “%>%” operator. The select() method is available in the dplyr library. Therefore, we need to use this library.

Syntax:

dataframe_object %>% select(column,….))

Parameters:

  1. The dataframe_object is the data frame.
  2. The column is the name of the column/s to be retrieved.

Example

In this example, we will extract “market_id”,”market_squarefeet”, and “market_place” columns at a time using the select() method.

library(«dplyr»)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns – «market_id»,»market_squarefeet», and «market_place»

print(market %>% select(«market_id», «market_squarefeet»,«market_place»))

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Scenario 2: Extract Columns From Data Frame by Column Indices

In this scenario, we will see different methods to extract column/s from a data frame using column index. It returns the values present in the column in the form of a vector. Index starts with 1.

Method 1: Specifying Column Indices in a Vector

Here, we are specifying column indices to be extracted inside a vector.

Syntax:

dataframe_object[,c(index,….)]

Where,

        1. The dataframe_object is the data frame.
        2. The index represents the column/s position to be retrieved.

Example

In this example, we will extract “market_id”,”market_squarefeet”, and “market_place” columns at a time.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns – «market_id»,»market_squarefeet» and «market_place» using column indices

print(market[ , c(1,5,3)])

Result:

We can see that the columns – “market_id”,”market_squarefeet” and “market_place” were returned.

Method 2: subset() With select()

In this case, we are using subset() with select parameters to extract columns from the data frame with column indices. It takes two parameters. The first parameter is the dataframe object and the second parameter is the select() method. The column indices through a vector are assigned to this method.

Syntax:

subset(dataframe_object,select=c(index,….))

Parameters:

  1. The dataframe_object is the data frame.
  2. The index represents the column/s position to be retrieved.

Example

In this example, we will extract “market_id”, “market_squarefeet”, and “market_place” columns at a time using the subset() method with select parameter.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns – #extract columns – «market_id»,»market_squarefeet» and «market_place» using column indices

print(subset(market,select= c(1,5,3)) )

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Method 3: select()

The select() method takes the column indices to be extracted from the data frame and loaded into the data frame object using the “%>%” operator. The select() method is available in the dplyr library. Therefore, we need to use this library.

Syntax:

dataframe_object %>% select(index,….))

Parameters:

  1. The dataframe_object is the data frame.
  2. The index represents the column/s position to be retrieved.

Example

In this example, we will extract “market_id”,”market_squarefeet”, and “market_place” columns at a time using the select() method.

library(«dplyr»)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#extract columns – #extract columns – «market_id»,»market_squarefeet» and «market_place» using column indices

print(market %>% select(1,5,3))

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Conclusion

This article discussed how we could extract the columns through column names and column indices using the select() and subset() methods with select parameters. And if we want to extract a single column, simply use the “$” operator.



Source link


Consider a requirement that you need to reorder the columns in an R data frame. How can you do that? Go through this article to get the solution for the given requirement.

Now, we will see how to reorder the columns in the data frame. First, let’s create a data frame.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#display the market dataframe

print(market)

Result:

You can see the market data frame here:

Let’s discuss them one by one.

Method 1: select() With Column Names

The select() method available in the dplyr library is used to select the columns provided in the order inside this method.

It takes two parameters. The first parameter represents the DataFrame object, and the second parameter represents the column names.

Syntax:

select(dataframe_object,column,…………)

Parameters:

  1. The dataframe_object is the data frame.
  2. The column represents the column names in which the data frame is ordered based on these columns.

Example

In this example, we will reorder the columns in the market-dataframe: market_name, market_place, market_squarefeet, and market_id,market_type.

library(dplyr)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe – market_name,market_place,market_squarefeet, market_id and market_type

print(select(market,market_name,market_place,market_squarefeet, market_id,market_type))

Result:

From the previous result, we can see that the data frame is returned with respect to the columns provided.

Method 2: select() With Column Indices

The select() method available in the dplyr library is used to select the columns provided in the order inside this method.

It takes two parameters. The first parameter represents the DataFrame object, and the second parameter represents the column indices.

Syntax:

select(dataframe_object,column,…………)

Parameters:

  1. The dataframe_object is the data frame.
  2. The column represents the column indices in which the data frame is ordered based on these columns.

Example

In this example, we will reorder the columns in the market-dataframe: 2, 3, 5, 1, and 4.

library(dplyr)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe – market_name,market_place,market_squarefeet, market_id and market_type

print(select(market,2,3,5,1,4))

Result:

From the previous result, we can see that the data frame is returned with respect to the column indices provided.

Method 3: select() With order()

The select() method takes the order() method as a parameter to reorder the data frame in ascending or descending order. It takes two parameters. The first parameter takes the order() method and the second parameter decreases, which takes Boolean values. FALSE specifies reordering the data frame based on the column names in ascending order, and TRUE specifies reordering the data frame based on the column names in descending order. Finally, the select() method will load this into the DataFrame object using the %>% operator.

Syntax:

dataframe_object %>% select(order(colnames(dataframe_object ),decreasing))

Parameters:

  1. The colnames(dataframe_object) return the columns and load into order() method.
  2. Decreasing is used to reorder the data frame in ascending or descending order.

Example 1

In this example, we will reorder the columns in the market-dataframe in ascending order.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe alphabetically in ascending order

print(market %>% select(order(colnames(market),decreasing = FALSE)))

Result:

From the previous result, we can see that the data frame is reordered with respect to the column names in ascending order.

Example 2

In this example, we will reorder the columns in the market-dataframe by descending order.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe alphabetically in descending order

print(market %>% select(order(colnames(market),decreasing = TRUE)))

Result:

From the previous result, we can see that the data frame is reordered with respect to the column names in descending order.

Method 4: arrange()

The arrange() method in the dplyr library is used to arrange the data frame based on the column in ascending order. It will load the arranged data frame into the data frame using the %>% operator. It is also possible to arrange the data frame in descending order by specifying the desc() method.

Based on the values in a specified column, it will reorder the columns.

Syntax for ascending order:

dataframe_object %>% arrange(column)

Syntax for descending order:

dataframe_object %>% arrange(desc(column))

Parameter:

It takes only one parameter, i.e., a column in which the remaining columns are reordered based on these column values.

Example 1

In this example, we will reorder the columns in the data frame based on market_place column values in ascending order.

library(dplyr)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe in ascending order based on market_place

print(market %>% arrange(market_place))

Result:

Here, the remaining columns are reordered based on market_place column values in ascending order.

Example 2

In this example, we will reorder the columns in the data frame based on market_place column values in descending order.

library(dplyr)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe in descending order based on market_place

print(market %>% arrange(desc(market_place)))

Result:

We can see remaining columns are reordered based on market_place column values in descending order.

Method 5: arrange_all()

The arrangeall() method in the dplyr library is used to arrange the data frame based on column names in ascending order.

Syntax:

arrange_all(dataframe_object)

Parameter:

It takes only one parameter, i.e., the DataFrame object.

Example

In this example, we will reorder the columns in the data frame using the arrange_all() method.

library(dplyr)

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c(‘M1’,‘M2’,‘M3’,‘M4’),
market_place=c(‘India’,‘USA’,‘India’,‘Australia’),market_type=c(‘grocery’,‘bar’,‘grocery’,
‘restaurent’),market_squarefeet=c(120,342,220,110))

#reorder the market-dataframe alphabetically

print(arrange_all(market))

Result:

We can see that the data frame is reordered based on column names in ascending order.

Conclusion

In the article, we have seen five different methods to reorder the columns in the data frame. The select() method is used to reorder the data frame columns using column names and column indices. Next, we used order() with select(), and we saw how to reorder the columns based on the column values in both increasing and decreasing order using the arrange() method. Finally, we used arrangeall() to reorder the columns in the data frame based on column names alphabetically.



Source link


“When a production sensor fails, you’ll only be able to collect accurate measurements on four of the assembly line’s six measurement points. However, one of the quality sheet’s marks is illegible. You may be without samples for a whole shift. Therefore, this may influence your statistical computations. Missing data is not handled gracefully by several processes. In this article, we’ll look at a few different techniques to get rid of NA values in R. This permits you to restrict your computations to R data frame rows that meet a specific level of completion.

When no data is available with one or more modules or for an entire unit, it is recognized as lost data. In everyday environments, missing values is a foremost issue. NA (Not Available) entries are used to represent missing records in R. Many datasets come in DataFrame with missing values because they either exist but were not acquired or because they never existed.”

How to Get Rid of the NA Values in the R Programming Language in Ubuntu 20.04?

The symbol NA is used in R to signify missing values (not available). NA can indicate empty values in DataFrame columns in R Programming Language. We will look at how to get rid of NA rows in one column in this article.

Example # 1: Using is.na Method to Remove NA in R in Ubuntu 20.04

We can use is.na to eliminate such NA values from the vector. The na values are obtained using is.na() and the vector index. All values except na will be returned by is.na().

In the example above, we have a vector representation where some random numbers are included along with the NA values. The output also generated the NA value. Thus, we want to remove them. For this, we have called the V1 inside the function is.na, which will eliminate all the existence of NA values in the vectors. The output from this function displays the number only.

Example # 2: Using the na.rm Method to Remove NA in R in Ubuntu 20.04

By evaluating the sum, mean, and variance, we may also remove na values. The na.rm is a method that is used to get rid of na. If na.rm=TRUE, na is ignored; if na.rm=FALSE, na is considered.

So, starting with creating the vector collection, which has some numbers and NA values. This vector collection is stored inside the variable Vec. Then, these NA values are first removed by evaluating the variance represented as var. Then, we evaluated the sum and meant on the Vec to eliminate the NA values. Note that we have na.rm set to TRUE, which will avoid NA in the vector.

Example # 3: Using omit Method to Remove NA in R in Ubuntu 20.04

The omit() method eliminates NA values directly, returning non-NA values and discarded NA values indexes as a result. This is the simplest choice. The na.omit() method returns the result without any na values in any of the rows. In the R language, this is the quickest technique to eliminate na rows.

Here, we have initialized the variable integers with the vectors. Then, with the print command, we have generated the output of the vectors. So, in the output, we have seen some NA values. To remove these NA from the vector, we have the na.omit function, which takes the integers variable as input for removing NA values. After this, we have checked through the print statement whether the NA values are removed from the vectors. When the output id is generated, it shows no NA values in the integers.

Example # 4: Using the complete.cases Method to Remove NA in R in Ubuntu 20.04

For various sorts of analysis of data in the computer language R, a detailed data frame without any missing values is required. The complete.cases method will get this. This r function examines a data frame and returns a result vector of missing values in the rows.

As in the preceding example, we have vector representations. Now, we are eliminating the NA values from the data frames. For this, we have created the data frame inside which, for each column, we have inserted some NA values. Then, we have called the complete.case function that takes the data frame as an input option. The data2 holds this operation which is printed and shows that the NA values are removed.

Example # 5: Using the rowSum Method to Remove NA in R in Ubuntu 20.04

R has the built-in method rowSums, which generates the sums for every row in the data collection in the format of rowSums(x). Additional parameters can be specified, the most significant of which is the Boolean argument of na.rm, which instructs the function whether to skip NA values.

After creating the data frame inside the variable data, we have applied the rowSums method. Within the rowSum, we have is.na method and ncol method. Note that it only removes third-row NA values. As the other rows also contain the NA values.

Example # 6: Using the filter Method to Remove NA in R in Ubuntu 20.04

We can also use the tidyvers dplyr package to drop just rows where all values are missing. Then we can utilize a combination of the dplyr package’s filter function, and Colchoneta R’s is.na function. We will show you how to delete only the rows in which all the data entries are NA.

Using the dplyr package for the filter function, we have created the data frame. Then, we have applied the filter function of this data frame and display the output, which has removed the NA values from the third row.

Conclusion

We have learned to remove the na from the R language that appears single or multiple times in the vectors or data frame at this stage in the session. We have covered six methods that help us to remove the na from the given data. These methods are finta easy to implement in the R scripting language, which can remove NA values from the rows and columns too. Also, some methods required the R dplyr package to eliminate the NA.



Source link