This script is essential to attaching state and region data to each tweet, which is necessary for any analysis of the regional distribution of terms. The quantity of each term in each CSV file is also tabulated in this section, which provides important information about the data being analyzed, and could help explain the ultimate geographic distribution.
After running the setup code and the code creating the join_tweets_to_states function, simply replace the names of my CSV files, such as “plurals” in the “Second Person Plurals” section, with the name of your own. For the join_tweets_to_states function to work, an API key from the Census Bureau will be necessary, as mentioned in the previous section.
# Script-specific options or packages
library(skimr)
<- function(tweets) {
join_tweets_to_states # Function
# Takes a data frame result from an rtweet::search_term() query
# and maps the tweets with lat/lng values to US states
#
# Requires a US Census API Key: https://api.census.gov/data/key_signup.html
# that is added to the .Renviron file with tidycensus::census_api_key()
library(tidyverse, quietly = TRUE) # data manipulation
library(tidycensus) # us state geometries
# Get the spatial geometries for each US state/ territory
<-
states_sf get_acs(geography = "state", # states
variables = "B01003_001", # total population
geometry = TRUE) %>% # spatial geometry
select(state_name = NAME, geometry) # only keep state name and geometry columns
# Align the CRS for tweets to the census
<-
tweets_sf %>% # original data frame
tweets filter(lat != "") %>% # only keep tweets with lat/lng values
st_as_sf(coords = c("lng", "lat"), # map x/long, y/lat values to coords
crs = st_crs(states_sf)) # align coordinate reference system
# Map tweet coords to census state geometries
<-
tweets_states_sf st_join(tweets_sf, states_sf)
<-
tweets_states_df %>% # spatial features object
tweets_states_sf as_tibble() %>% # convert to tibble (remove spatial features)
filter(state_name != "") # keep only US states/ territories
return(tweets_states_df) # return the final data frame
}
<- read_csv(file = "../data/original/twitter/plurals.csv") # read dataset from disk plurals
## Warning: One or more parsing issues, see `problems()` for details
%>%
plurals count(search_term, sort = TRUE) # preview search term counts
<- read_csv(file = "../data/original/twitter/sale.csv") # read dataset from disk sale
## Warning: One or more parsing issues, see `problems()` for details
%>%
sale count(search_term, sort = TRUE) # preview
<- read_csv(file = "../data/original/twitter/shoes.csv") # read dataset from disk
shoes
%>%
shoes count(search_term, sort = TRUE)
<- read_csv(file = "../data/original/twitter/tonix.csv") # read dataset from disk
tonix
%>%
tonix count(search_term, sort = TRUE) # preview
<- read_csv(file = "../data/original/twitter/roads.csv") # read dataset from disk
roads
%>%
roads count(search_term, sort = TRUE) # preview
This script should output information on the quantity of each term gathered, and ensure the census information can be used in the project henceforth.
Vaux, Bert. “Harvard Dialect Survey.” Harvard Dialect Survey, 2003, http://dialect.redlog.net.