comments | difficulty | edit_url | tags | |
---|---|---|---|---|
true |
Easy |
|
Table: Emails
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | email | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to find all unique email domains and count the number of individuals associated with each domain. Consider only those domains that end with .com.
Return the result table orderd by email domains in ascending order.
The result format is in the following example.
Example 1:
Input: Emails table: +-----+-----------------------+ | id | email | +-----+-----------------------+ | 336 | [email protected] | | 489 | [email protected] | | 449 | [email protected] | | 95 | [email protected] | | 320 | [email protected] | | 411 | [email protected] | +----+------------------------+ Output: +--------------+-------+ | email_domain | count | +--------------+-------+ | outlook.com | 2 | | yahoo.com | 1 | +--------------+-------+ Explanation: - The valid domains ending with ".com" are only "outlook.com" and "yahoo.com", with respective counts of 2 and 1. Output table is ordered by email_domains in ascending order.
First, we filter out all emails ending with .com
, then use the SUBSTRING_INDEX
function to extract the domain name of the email. Finally, we use GROUP BY
to count the number of each domain.
# Write your MySQL query statement below
SELECT SUBSTRING_INDEX(email, '@', -1) AS email_domain, COUNT(1) AS count
FROM Emails
WHERE email LIKE '%.com'
GROUP BY 1
ORDER BY 1;
import pandas as pd
def find_unique_email_domains(emails: pd.DataFrame) -> pd.DataFrame:
emails["email_domain"] = emails["email"].str.split("@").str[-1]
emails = emails[emails["email"].str.contains(".com")]
return (
emails.groupby("email_domain")
.size()
.reset_index(name="count")
.sort_values(by="email_domain")
)