Obinna Emelike
Obinna Emelike
A cross section of the dignitaries at the programme inauguration in Lagos
$(document).ready(function(){(adsbygoogle = window.adsbygoogle || []).push({})});
who was the first employee for Lee Alexander McQueen
unveils her collection of artworks across varying mediums to L’OFFICIEL
their individual journeys into developing them
and the unique challenges of maintaining them
has curated an auction in collaboration with musician
athlete and designer Jackson Wang to celebrate personal reinvention and creativity
and seashell enthusiast Jenny Slate opens up her archive to L’OFFICIEL and discusses her new book
See every red carpet look from fashion's biggest night
See who showed some skin at the 2025 Met Gala
like Sabrina Carpenter and Emma Chamberlain
The actor made a bold chop for the 2025 Met Gala carpet
Take a look at the best dressed celebrities from the 2025 Met Gala
The Blackpink star and Chanel muse turned heads with a jumpsuit that some are calling her most stunning Met Gala look yet
The rapper has quickly become a fashion favorite—and at the 2025 Met Gala
she cemented her status as the night’s breakout style star
The Formula 1 driver took his talents from the race track to the red carpet to co-chair tonight's event
LEA Reply™ Dronefor Grupo Trino Logistics Reply delivered an end-to-end logistics solution for PepsiCo warehouses operated by Grupo Trino to cover all processes
from WMS implementation to the stocktaking using drones
Have you ever considered streamlining your warehouse processes using Drone solutions?Find out how Logistics Reply helped Grupo Trino
a third-party Logistics provider specialized in (but not limited to) the food sector
and outbound processes delivering value to its entire supply chain
Grupo Trino has sought out the best market practices
integrating technology and innovation in its operations
To increase productivity and ensure the safety of the operators
Grupo Trino has adopted the use of drones in their central warehouse for PepsiCo
to perform cyclical and monthly inventories
Grupo Trino is one of the only logistics companies in Brazil to use this technology
Our Customer says"This digital and technological transformation brings us many benefits
The main aspect is that we improve the accuracy
and efficiency in our stocktaking processes."
The partnership between Logistics Reply and Grupo Trino started in 2019
Grupo Trino was looking for a technological solution provider that could help the company to improve its logistics processes aggregating leading technologies and strong processes that would allow the company to grow in a very competitive market
After extensive research including product capabilities demos
Grupo Trino chose Logistics Reply for a partnership and to improve its supply chain
Starting with a challenging project for PepsiCo
one of the biggest food and beverage companies around the globe
Grupo Trino and Reply delivered an end-to-end warehouse operation including a very innovative solution for stocktaking in 2019
Grupo Trino and Reply delivered a full solution to another huge operation for PepsiCo in a site that includes the industrial process for cookiesBased in the Brazilian market
with a very challenging logistics infrastructure and quite a different HDI between its regions
This would allow the company to manage operations for huge multinational companies and overcome multinational 3PL providers on the bids
The company needed to support 24/7 operations with high volumes and aggressive service level agreements for customers that are leaders in their segments
production and a major distribution center
together Grupo Trino and Reply also implemented the full Reply WMS solution for one of the biggest multinational home appliances manufacturers from the USA with sites in Brazil
the companies are right now working on a roadmap that will cover all Grupo Trino customers and new leads
Our award-winning drone stocktaking solution completely changes the way companies perform their stock audit
A solution that has already been chosen by several companies from different sectors
Why do our customers trust and choose Logistics Reply's drone stocktaking solution
the average time to audit a warehouse stock location is less than 5 seconds
including all efforts spent in indirect activities such as battery replacement and the time spent to move from one stock location to another
It is much faster than any conventional method to perform the stocktaking
The drone automatically recognizes the barcodes of stock locations and the Handling Unit
and then compares the expected HU and the actual HU found in the location
allowing the operator to analyze possible stock divergences from the comfort of their workstation
or any other equipment to perform the stocktaking
reducing errors and contributes to avoiding workplace incidents/accidents
especially when needing to reach high shelf locations
the drone is under full control and enabled with obstacle avoidance mechanisms
LEA ReplyTM stocktaking progress and outcomes can be easily monitored in real-time
It allows the company to have a clear and accurate view of the stock on hand
While the solution natively integrates with our LEA ReplyTM WMS and Click ReplyTM WMS
Stocktaking results can be shared via API interfaces or Excel files
allowing to easily activate the solution without complex projects
Show Search Search Query Submit Search Don't Miss
Print • Trino Garcia and Adam Vasquez
went viral when their kiss on a bridge was shared by photographer Henry Jiménez Kerbox.• Today
with more than 2 million followers on TikTok
the Angeleno couple is challenging perceptions of masculinity
Adam Vasquez and Trino Garcia walked across a bridge overlooking the 110 Freeway
It was their first time being photographed as a couple
holding hands and sharing a kiss in public
“Adam” is inked on the right and “Mexicano” on the left
with “Trino” on the left and “Chicano” on the right
he reveals a tender tribute to his childhood
family and God — etched into his skin is Charlie Brown
the face of his daughter Natalie when she was a baby and a portrait of the Virgin Mary
don’t exactly fit the stereotypical image of social media influencers
But with more than 2 million followers on TikTok
they’re breaking barriers and challenging perceptions of masculinity
share the story of their journey from closeted teens to beloved internet personalities
spotted a photo of Vasquez in a friend’s work locker and was instantly smitten
It took him a month to finally spot Vasquez’s contact information on a friend’s phone at a party
He borrowed a pen and wrote the number down on his hand
Adam Vasquez and Trino Garcia
2024A previous version of this story misidentified Trino Garcia and Adam Vasquez in photo captions
everyone Vasquez hung out with was dealing with drugs in some way
Vasquez himself was addicted to crystal meth
Vasquez says he saw in Garcia a lifeline to normalcy: “Everyone I associated with always did what I did
So I never had that outlet to escape that.”
but Vasquez’s body started to ache and shake due to withdrawal
I went beneath the table and made myself feel better.” As he continued using
Garcia recognized a kindred spirit yearning for acceptance
as a single dad to baby Natalie and in not being accepted by his family after coming out
and it made me feel really connected with him.”
Vasquez continued to battle his addiction; Garcia would catch him using drugs under the table or find drugs in his pockets
Garcia and Natalie were part of the motivation to stop using
“I had our daughter that I wanted to be better for,” he says
“It’s an honor to find somebody that you’re with for so long,” Vasquez says
“There are so many levels to us at this point: We are friends
Raising Natalie as a gay couple in Bakersfield came with its own set of challenges
When they went to parent meetings at Natalie’s school
Vasquez was always the “uncle,” because they didn’t want their daughter to have unnecessary attention or trouble
a mother of Natalie’s elementary school classmate confronted Garcia
saying she thought Natalie should go to counseling because she was missing a mother in her life
but he also felt fear: “What if we did something wrong?” he would ask himself
though she has noticed the judgments of others due to her fathers’ appearance
when she went shopping with Garcia and Vasquez
people would follow them to make sure her dads weren’t stealing anything
“People actually found it really interesting and cool” that she had two dads
“[Queerness] has been something normal in my life,” she says
she was surrounded by Garcia and Vasquez’s friends and would go to Pride parades with them
“I never felt I was missing out on something
because they were just so involved in my life.”
“We raised her as two parents,” Vasquez says
“Trino was there with her to get her nails and hair done
I would work hard to make sure she had everything she needed.”
They also encouraged her passion for dance
Garcia would record her moving to music; later
Adam Vasquez and Trino Garcia dance alongside rapper Snow Tha Product at Beaches WeHo in West Hollywood
“I wanted her to see life the way I didn’t see it,” he says
“I wanted her to dream big and express herself.”
from Bakersfield so Natalie could get better opportunities in dance
who was working at both Red Robin and Chili’s
transferred to the Whittier locations so he could have a better commute
“I became one of the best servers at Red Robin and Chili’s,” he says
would stay late to clean the studios to cover her tuition
Their dedication paid off — Natalie is now in her second season as a dancer for the L.A
She also recently introduced her first boyfriend to her dads
who “have always been a big support system,” she says
“They were willing to drop everything they had in Bakersfield to come over to L.A
[for me to] pursue what I really want to do.”
Growing up as the only sons in their Catholic families
Vasquez and Garcia both felt the weight of cultural expectations and religious beliefs
was the only one in his family who initially accepted his queerness
She said he was “quiet,” “sensitive” and “a sweetheart” as a kid
When their father saw Garcia was attracted to the “girls’ stuff” of his four sisters
he put him on baseball and basketball teams to make him act more like a “boy,” she says
“Those kids were my brother’s bullies,” Brenda Garcia says
and I could feel so much pressure on him.”
Garcia went to church to confess to the priest that he found himself attracted to other boys
to be the tough Chicano man that his father wanted him to be
Garcia intentionally had “become bad,” says Brenda Garcia
had lots of girlfriends and started to smoke
“[my attraction to men] didn’t go away,” he says
“It wasn’t until my daughter was born that the reality told me I need to wake up” and accept who he was
that he came out and left Oxnard for Bakersfield
and Adam Vasquez have a sip while talking to each other outside of a friend’s house in Compton
Vasquez also was the only boy in his family
His father left the family for another woman when Vasquez was little
“He had a baby with her and called that son the ‘junior,’ but I was his first boy,” he says
so I turned to drugs to fill up the void.”
He said he found it hard to pray when he realized his sexual orientation and that he got into drugs as he felt he had turned his back on God
especially as the only son in a Catholic family
her initial reaction was devastating: “I don’t have a son anymore.” He moved out that same day
but I’ve never been so happy in my life,” Vasquez says
“Why [is] loving this man going to send me to hell?”
In a world that often equates gayness with flamboyance
baggy clothes and a style rooted in Chicano culture — might challenge stereotypes about what it means to be gay
But beneath the tough exterior lie hearts filled with love and a desire for acceptance
butterflies and words like “love” and the name of their daughter
This juxtaposition of traditional masculinity and open vulnerability is at the core of their appeal
They’re showing a generation of young men that there’s no one way to be gay
Their viral moment in 2023 catapulted them into the spotlight in a way they never expected
His notifications “just went crazy” with people sending likes and following the couple
people started to line up and take pictures with them
Adam Vasquez during L.A
Pride at Los Angeles State Historic Park in June
The attention has led to a sense of freedom for the couple
share their outfits and post daily life or travel vlogs
They’re using their platform to challenge stereotypes and promote acceptance
and we’re not hiding in the closet anymore,” Vasquez says
going to places where we shouldn’t be embraced
During this interview at a Starbucks in Van Nuys
Their message resonates beyond the LGBTQ+ community
They’ve been welcomed at lowrider shows and spoken at prisons
breaking down barriers and fostering understanding
“Maybe you don’t agree with it,” Vasquez says
someone that’s been hiding all their lives.”
Vasquez and Garcia balance their social media presence with their day jobs as community integration facilitators at the organization Social Vocational Services
working with individuals with developmental disabilities and taking them on recreational activities
“Living in the social media world can make you lose yourself fast,” says Garcia
it makes us grateful for where we’re at in life
and Adam Vasquez dance while filming a TikTok and singing along to the Usher song “Burn.” Their plan is not to be influencers
forever known as “the guys on the bridge,” he adds
but to use their social media presence to “speak about something powerful.” They’re also planning on writing a book about their love story and where they came from
In October, the couple released their first original rap song, “Vibe Out”; Natalie dances in the music video
“Adam is rapping a lot in this piece,” Garcia says
they’ve found public reaction heartwarming and encouraging
But getting full acceptance from their families may be a lifelong journey
it’s bittersweet: “The people that I wanted to see me is my mother
continues to be a supportive force in his life
When her youngest daughter asked her about Garcia and Vasquez’s relationship
Is that going to make you change the way you see your uncle?’ And she said no.”
is proud of the men Vasquez and Garcia have become and the life they’ve built together
For their Nov. 30 nuptials
Garcia and Vasquez have invited about 20 family members and friends from their personal circle
“People have been embracing us,” Garcia says
“And we want to celebrate with the people that have been healing us.”
Following a short ceremony onstage, several DJs will play at the “club-vibe party with music,” the couple says. Tickets are available to the public via Ticketmaster, and the event will be livestreamed on TikTok.
Along with the wedding, there’s another milestone in the works: After Nov. 30, Vasquez will be Adam Issac Vasquez Garcia, “so that the three of us can be Garcia,” he says.
“We’re like a beautiful plant that grows slowly and blooms more beautifully,” Garcia says of their relationship. “We were like two broken pieces,” Vasquez adds, “and coming together we became a full, complete person.”
Grace Xue was a 2024 Features/Lifestyle intern at the Los Angeles Times. She is from Beijing, China, and completed her master’s degree at Northwestern University Medill School of Journalism. Xue previously worked for the Mac Weekly and Star Tribune in Minnesota.
Michael Blackshire was a 2023-24 photography fellow at the Los Angeles Times. He previously interned at the Washington Post and Chicago Tribune and his work has been published in the New York Times, the Guardian, the Wall Street Journal, Bloomberg, Huffington Post and New York Magazine. Blackshire is from Kentucky and spent his teenage years in Metro Atlanta. He received his higher education from Western Kentucky University and Ohio University.
Entertainment & Arts
Lifestyle
Subscribe for unlimited accessSite Map
Michigan-based magician Trino will bring magic
and variety to the Wealthy Theatre on April 11th and 12th
"Magic Fest" will feature two nights of unforgettable performances from the family-friendly group Presto
as well as Grand Rapids' monthly magic show
Presto is a variety lineup comprised of hypnotist Chrisjones
Amaze & Amuse is closing out their season featuring magicians and mind-reader Noah Sonie
Amaze & Amuse is recommended for ages 13+ due to language and themes
Both events begin at 7 p.m., with doors opening at 6 p.m. General admission tickets start at $30, or limited family 4-packs for $100 and are available at grmagicfest.com.
Follow the FOX 17 Morning Mix on: Facebook, Instagram, & TikTok
Latino artist Trino Mora, one of the pioneers of Latin rock, has passed away at the age of 81 after a long battle with prostate cancer
In a heartfelt statement shared on social media
Mora's passing was confirmed by his brother
A post shared by instagram
The news shocked fans of Spanish-language music, especially in Mora's homeland of Venezuela, where he returned after studying at the Military Academy of Fort Lauderdale during high school. Mora later pursued university studies in economics
Mora shared his lifelong dream: "I wanted to be like Marlon Brando
and sing like Elvis Presley." Mora achieved a tropicalized version of this vision
blending daring rock-inspired looks with theatrical flair
He famously donned a bright red velvet suit when he competed in the 1967 edition of 'La Voz Juvenil,' a precursor to today's global franchise 'The Voice.'
the experience catapulted him to fame as one of the most recognizable faces in Venezuela's alternative music scene
Mora's musical style was eclectic, spanning rock ballads, soul, jazz, and even touches of the Argentine star Sandro's influence. Beyond music, his personal life captured public attention, including high-profile romances with TV actresses like Mayra Alejandra ('Leonela') and Jeanette Rodríguez ('Cristal,' 'La Dama de Rosa')
including a revolutionary rock mass dedicated to Elvis Presley
He also performed extensively in the United States
where his musical productions gained widespread acclaim in the 1960s and 1970s
Trino Mora's legacy as a trailblazer in Latin rock continues to resonate
inspiring generations of artists across Latin America and beyond
His passing marks the end of an era for fans of his genre-defining music and larger-than-life persona
UCSD Park & Market
HOST AN EVENT COLLABORATE WITH US
Featured presenters from Mexico and the US illuminate our unique binational insights that resonate with fans through humor & intelligence and tackle timeless themes from social commentary to daily life–focused on bringing people together
and humor of iconic cartoonists Jis y Trino
always humor on the reality of society in contemporary Mexico
The Jis y Trino’s Big Cartoon Jam programs are presented in partnership with the Consulado General de México en San Diego
Thank you to our event partner the Consulado General de México en San Diego
© Regents of the University of California 2025. All rights reserved. Design by TinyFrog Technologies
You are using an outdated browser. Please upgrade your browser to improve your experience
An iceberg is a piece of ice broken off from a glacier that floats freely in open water
and this definition also provides an appropriate name for the database table format project Iceberg
Open-source Apache Iceberg provides full database functionality on top of cloud object stores
It exemplifies how the separation of storage from compute in modern data stacks has allowed scalable
cost-effective computing and improved interaction between various systems
Two engineers at Netflix Inc. created Iceberg to overcome the challenges they encountered using existing data lake formats such as Apache Spark or Apache Hive. The engineers needed a solution to navigate their employers’ massive Amazon S3-stored media streaming files
“We had the same problems that everybody else did, but 10 times worse,” recalled Ryan Blue
co-founder and chief executive officer of Tabular Technologies Inc
“Every request to [Amazon’s] S3 was not seven milliseconds
it was 70 milliseconds … so all the things that you had to do really quickly to make sure your database doesn’t lie to you
chief technology officer of Starburst Data Inc
They discussed the evolution and significance of separating storage from compute in modern data stacks
“We had to go and look at Hive tables and say
That model of keeping track of what’s in our table is too simplistic; it’s not going to work in a world based on object stores,’” Blue said
We designed for the constraints we were working with.”
For Sundstrom, the impetus for the open-source Trino distributed query engine came from the need to replace Facebook Inc.’s 300-petabyte Hive data warehouse
ad-hoc analytics queries over big data file systems
“[Hive] was a great way to have less super skilled engineers be able to interact with the massive data sets Facebook had,” Sundstrom said
we came in to build a much more powerful distributed system using traditional database techniques.”
As the commercial developer of the distributed query engine based on Trino, Starburst has taken steps in recent months to make it easier for organizations to build applications on top of data lake architectures. In November, the company released a set of new features that provided unified data ingestion
governance and sharing on a single platform
you get into this problem of there’s just too much data to reasonably process; the queries are too big
and [customers] want to move to a more cost-effective solution,” Sundstrom said
“Often people start off with Starburst by just hooking it up and exploring the data in their existing platform because Trino supports federation.”
In April, Starburst announced that it would release a fully managed Icehouse data lake on its cloud. Icehouse combines Trino and Iceberg storage to support near-real-time data ingestion into a managed Iceberg table at a petabyte scale
“You can explore your data [and] you can play with it,” Sundstrom said
it is the best in terms of data lake formats
“Iceberg has two things that the other formats lack in some respect,” Blue said
where it’s owned and controlled by the Apache Software Foundation
We really wanted this project to be something that was a foundational layer in data architecture
and we knew we needed to have a neutral community
Here is the complete conversation, part of the The Road to Intelligent Data Apps series:
Google upgrades Android Studio with enterprise-grade Gemini AI tools
Atlassian upgrades its AI assistant Rovo and makes it available to all customers
CodeSecure and FOSSA partner to enhance visibility into open source and binary code
Google brings the power of AI to healthcare with new partnerships
Google Cloud opens its global network for enterprise wide-area network use
Agent2Agent: Google announces open protocol so AI agents can talk to each other
AI - BY KYT DOTSON
SECURITY - BY DUNCAN RILEY
AI - BY PAUL GILLIN
Forgot Password?
Secure Mobility Intelligence with Enhanced Data Accessibility
BOSTON, Dec. 18, 2024 /PRNewswire/ -- Starburst
a company specializing in mobility data analytics
has significantly optimized its data infrastructure using Starburst Galaxy
This collaboration has enabled Arity to achieve 10X faster data processing
and double data accessibility for non-engineering teams
all while efficiently managing petabytes of driving data in a unified data lakehouse
we're now able to make data accessible across our organization while significantly reducing costs and processing time
Starburst's SQL-based platform lowers the barrier to data insights
helping us operate more effectively and securely," said Reza Banikazemi
Overcoming Data Scalability and Accessibility Barriers
known for managing one of the world's most extensive datasets of driving behavior connected to insurance claims data
encountered several challenges as it scaled its operations
Their existing solution was becoming increasingly resource-intensive and costly
The technical complexity created accessibility barriers
limiting data usage to engineering teams and causing bottlenecks that restricted other departments
analyzing large datasets could take 45 minutes or more
which hindered critical time-sensitive applications such as mobility insights and real-time analytics
scalable data solution that adhered to data privacy and compliance standards
a fully managed data lakehouse leveraging Trino
This switch brought several significant benefits:
"Switching to Starburst Galaxy has streamlined our processing and improved data governance and lineage
Starburst has enabled us to shift from a complex data environment to a simpler
SQL-based approach that more of our teams can use effectively," said Banikazemi
Future Plans: Expanding Starburst Galaxy for Advanced Analytics and Innovative Insurance Solutions
Arity plans to expand its integration of Starburst Galaxy across various business units
This expansion will enable more advanced real-time analytics
support the creation of targeted advertising audiences
and facilitate the development of innovative insurance models based on driving behavior
we've reduced data processing costs by two-thirds and accelerated data access by 10X
we can get insights in minutes instead of hours
transforming our decision-making process," said Banikazemi
He also highlighted the enhanced data accessibility: "The SQL-based interface has effectively doubled our data users
empowering teams beyond just engineering." On scalability and security
"Starburst lets us keep data secure and compliant
expanding access without sacrificing security."
"Arity's success demonstrates the transformative potential of Starburst Galaxy for companies dealing with massive datasets," said Justin Borgman
"Our mission is to provide organizations with a powerful
and secure analytics platform that accelerates insights and reduces costs
We're proud to support Arity as they continue to innovate and set new benchmarks in mobility analytics."
is the leading end-to-end data platform to securely access
and share data for analytics and AI across hybrid
Starburst empowers the most data-intensive and security-conscious organizations like Comcast
and 7 of the top 10 global banks to democratize data access
With the Open Hybrid Lakehouse from Starburst
enterprises globally can easily discover and use all their relevant business data to power new applications and analytics across risk mitigation
For additional information, please visit https://www.starburst.io/
Logo - https://mma.prnewswire.com/media/2286645/Starburst_Logo.jpg
today announced the appointment of Jitender Aswani as Senior Vice President of Engineering
driven by strong demand from enterprise customers: Grew net new..
Computer & Electronics
Data Analytics
Data Analytics
Computer Software
Do not sell or share my personal information:
in the municipality of the same name in the province of Vercelli
It’s also adjacent to the former "Galileo Ferraris" thermoelectric power plant
The cooling towers at the old plant are still visible
The plant uses about 160,000 double-sided photovoltaic modules with state-of-the-art technology to maximize renewable production
3,096 trackers enable the panels to "chase" the Sun: their tilt changes according to its position
The Trino park is integrated with a lithium-ion battery storage system (BESS) with a capacity of 25 MW and a storage capacity of 100 MWh
This will ensure the adequacy of the power system and provide ancillary services to the Grid
those services that are necessary for ensuring the security of the entire power system
The solar park has a capacity of nearly 87 MW for an annual production of about 130 GWh
this means that the energy needs of about 47,000 households can be met by green energy
It will avoid the emission of 56,000 tons of CO2 into the atmosphere and the use of 29 million cubic meters of gas
which will be replaced with locally produced renewable energy
Ground-based solar photovoltaics + batteries
87 MW + 25 MW of electrochemical storage
About 29 million cubic meters of gas per year
*Estimate made assuming the average annual consumption of a typical family
The construction of the solar park led to the implementation of major conservation and restoration work on some of the buildings in Borgo Leri Cavour
This is where the summer residence of Camillo Benso Conte di Cavour (a key figure in Italy’s “Risorgimento” in the nineteenth century) once stood
The project also led to the creation of a new wetland area and a perimeter hedge along the plant fences
Wooded areas inside and adjacent to the planting area that are strategic for the area's fauna and flora were upgraded
significant reforestation work was carried out on substantial public and private land
The new plant was also built thanks to those citizens who contributed to the financing of the work
through the "Renewable Choice" crowdfunding campaign that was launched by Enel Green Power in 2022
The success of the campaign made it possible to meet and far exceed the fundraising goal (demand was 50% greater than the set target)
the citizens who joined the initiative will receive a return on their invested capital
The industry veteran has led the Sarabande Foundation to provide scholarships
mentorship and studio space to over 130 creatives
CatalystsFashion industry veteran Trino Verkade is the chief executive of the Sarabande Foundation, the charitable organisation set up by late designer Alexander McQueen to support emerging creative talent.
Born in Liverpool, Verkade joined McQueen as his first employee. She worked with him until his death in 2010, playing a multi-faceted role in supporting the business as McQueen transformed into a global fashion icon and helping him first establish the Sarabande Foundation in 2006.
After McQueen’s death in 2010, Verkade took on executive roles at Thom Browne and Mary Katrantzou . But she returned as CEO of the Foundation in 2017.
Under Verkade's guidance, the Foundation has evolved, reflecting McQueen's multidisciplinary approach by supporting a diverse range of artists, from fashion designers to sculptors. It has offered scholarships, mentorship and studio space to over 130 creatives.
The people shaping the global fashion industry
curated by the editors of The Business of Fashion
based on nominations and on-the-ground intelligence from around the world
In modern data architectures, the need to manage and query vast datasets efficiently, consistently, and accurately is paramount. For organizations that deal with big data processing, managing metadata becomes a critical concern. This is where Hive Metastore (HMS) can serve as a central metadata store
playing a crucial role in these modern data architectures
HMS has become a foundational component for data lakes
integrating with a diverse ecosystem of open source and proprietary tools
In non-containerized environments, there was typically only one approach to implementing HMS—running it as a service in an Apache Hadoop cluster. With the advent of containerization in data lakes through technologies such as Docker and Kubernetes
multiple options for implementing HMS have emerged
allowing organizations to tailor HMS deployment to their specific needs and infrastructure
In this post, we will explore the architecture patterns and demonstrate their implementation using Amazon EMR on EKS with Spark Operator job submission type
guiding you through the complexities to help you choose the best approach for your use case
we will use a Standalone Hive Metastore to illustrate the architecture and implementation details of various design patterns
Any reference to HMS refers to a Standalone Hive Metastore
The HMS broadly consists of two main components:
Containerization and Kubernetes offers various architecture and implementation options for HMS
we’ll use Apache Spark as the data processing framework to demonstrate these three architectural patterns
these patterns aren’t limited to Spark and can be applied to any data processing framework
that relies on HMS for managing metadata and accessing catalog information
the driver is responsible for querying the metastore to fetch table schemas and locations
then distributes this information to the executors
Executors process the data using the locations provided by the driver
never needing to query the metastore directly
in the three patterns described in the following sections
In this pattern, HMS runs as a sidecar container within the same pod as the data processing framework, such as Apache Spark. This approach uses Kubernetes multi-container pod functionality
allowing both HMS and the data processing framework to operate together in the same pod
The following figure illustrates this architecture
where the HMS container is part of Spark driver pod
This pattern is suited for small-scale deployments where simplicity is the priority
Because HMS is co-located with the Spark driver
it reduces network overhead and provides a straightforward setup
it’s important to note that in this approach HMS operates exclusively within the scope of the parent application and isn’t accessible by other applications
row conflicts might arise when multiple jobs attempt to insert data into the same table simultaneously
you should make sure that no two jobs are writing to the same table simultaneously
Consider this approach if you prefer a basic architecture
It’s ideal for organizations where a single team manages both the data processing framework (for example
and there’s no need for other applications to use HMS
In this pattern, HMS runs in multiple pods managed through a Kubernetes deployment
typically within a dedicated namespace in the same data processing EKS cluster
The following figure illustrates this setup
with HMS decoupled from Spark driver pods and other workloads
This pattern works well for medium-scale deployments where moderate isolation is enough
and compute and data needs can be handled within a few clusters
It provides a balance between resource efficiency and isolation
making it ideal for use cases where scaling metadata services independently is important
this pattern works well when a single team manages both the data processing frameworks and HMS
ensuring streamlined operations and alignment with organizational responsibilities
By decoupling HMS from Spark driver pods, it can serve multiple clients, such as Apache Spark and Trino, while sharing cluster resources. However, this approach might lead to resource contention during periods of high demand, which can be mitigated by enforcing tenant isolation on HMS pods
In this architecture pattern, HMS is deployed in its own EKS cluster deployed using Kubernetes deployment and exposed as a Kubernetes Service using AWS Load Balancer Controller
separate from the data processing clusters
where HMS is configured as an external service
This pattern suits scenarios where you want a centralized metastore service shared across multiple data processing clusters
HMS allows different data teams to manage their own data processing clusters while relying on the shared metastore for metadata management
By deploying HMS in a dedicated EKS cluster
and the flexibility to operate and managed as its own independent service
While this approach offers clear separation of concerns and the ability to scale independently
it also introduces higher operational complexity and potentially increased costs because of the need to manage an additional cluster
Consider this pattern if you have strict compliance requirements
need to ensure complete isolation for metadata services
or want to provide a unified metadata catalog service for multiple data teams
It works well in organizations where different teams manage their own data processing frameworks and rely on a shared metadata store for data processing needs
the separation enables specialized teams to focus on their respective areas
In the remainder of this post, you will explore the implementation details for each of the three architecture patterns, using EMR on EKS with Spark Operator job submission type as an example to demonstrate their implementation. Note that this implementation hasn’t been tested with other EMR on EKS Spark job submission types
You will begin by deploying the common components that serve as the foundation for all the architecture patterns
you’ll deploy the components specific to each pattern
you’ll execute Spark jobs to connect to the HMS implementation unique to each pattern and verify the successful execution and retrieval of data and metadata
we’ve automated the deployment of common infrastructure components so you can focus on the essential aspects of each HMS architecture
We’ll provide detailed information to help you understand each step
simplifying the setup while preserving the learning experience
Both analytics-cluster and datascience-cluster serve as data processing clusters that run Spark workloads
while the hivemetastore-cluster hosts the HMS
You will use analytics-cluster to illustrate the HMS as sidecar and cluster dedicated pattern
You will use all three clusters to demonstrate the external HMS pattern
You can find the codebase in the AWS Samples GitHub repository
make sure that the following prerequisites are in place:
Begin by setting up the infrastructure components that are common to all three architectures
You have completed the setup of the common components that serve as the foundation for all architectures
You will now deploy the components specific to each architecture and execute Apache Spark jobs to validate the successful implementation
To implement HMS using the sidecar container pattern
the Spark application requires setting both sidecar and catalog properties in the job configuration file
you will submit Spark jobs in analytics-cluster
The Spark jobs will connect to the HMS service running as a sidecar container in the driver pod
To implement HMS using a cluster dedicated HMS pattern
the Spark application requires setting up HMS URI and catalog properties in the job configuration file
The Spark jobs will connect to the HMS service in the same data processing EKS cluster
the Spark application requires setting up an HMS URI for the service endpoint exposed by hivemetastore-cluster
submit Spark jobs in analytics-cluster and datascience-cluster
The Spark jobs will connect to the HMS service in the hivemetastore-cluster
Use the following steps for analytics-cluster and then for datascience-cluster to verify that both clusters can connect to the HMS on hivemetastore-cluster
To avoid incurring future charges from the resources created in this tutorial
clean up your environment after you’ve completed the steps
You can do this by running the cleanup.sh script
which will safely remove all the resources provisioned during the setup
we’ve explored the design patterns for implementing the Hive Metastore (HMS) with EMR on EKS with Spark Operator
each offering distinct advantages depending on your requirements
Whether you choose to deploy HMS as a sidecar container within the Apache Spark Driver pod
or as a Kubernetes deployment in the data processing EKS cluster
or as an external HMS service in a separate EKS cluster
the key considerations revolve around communication efficiency
We encourage you to experiment with these patterns in your own setups
adapting them to fit your unique workloads and operational needs
By understanding and applying these design patterns
you can optimize your Hive Metastore deployments for performance
and security in your EMR on EKS environments
Explore further by deploying the solution in your AWS account and share your experiences and insights with the community
Avinash Desireddy is a Cloud Infrastructure Architect at AWS
passionate about building secure applications and data platforms
helping customers containerize applications
Suvojit Dasgupta is a Principal Data Architect at AWS
He leads a team of skilled engineers in designing and building scalable data solutions for AWS customers
He specializes in developing and implementing innovative data architectures to address complex business challenges
This Valentine's Day Weekend is the perfect time to get a ticket to an upcoming Amaze and Amuse show at Wealthy Theater
Local magician Trino will be joined by special guest magician "Just Joe" Chasney from Detroit for two shows on Saturday
if you arrive early you can see some exclusive close-up magic from sleight-of-hand artist Tyler Grey before the show
Amaze and Amuse shows occur monthly at Wealthy Theatre
with each performance featuring Trino alongside a different special guest act
Purchase tickets at grcmc.org
In Spanish
The case of Trinidad "Trino" Marín, the former husband of the late Mexican-American singer Jenni Rivera and father to her three eldest children—Chiquis, Jacqie
and Mikey—has reignited controversy within the Mexican-American community
Marín was arrested in 2006 and later convicted for sexually abusing his daughters
He was sentenced in 2007 to 31 years in prison
Marín has sought parole multiple times
though he has since pursued a different strategy for early release
His case has recently resurfaced in the media, drawing attention after Chiquis Rivera visited him in prison
documented in her docuseries Chiquis Sin Filtro on the streaming platform Vix
was part of her healing journey before her marriage to Emilio Sánchez
The story took another turn this past weekend when Rosie Rivera
Jenni's sister and one of Marín's victims
revealed in a live broadcast that an unexpected visitor had arrived at her mother's house
She recounted that an official had visited
Rosie later called the sergeant who explained that the District Attorney's office wished to discuss Marín's case
indicating that they were revisiting the sentencing
with the potential for a revised sentence or early release depending on a judge's decision
explaining that the sentence review process mirrored the one granted to the Menendez brothers
While Rosie has previously stated that Marín's release is his burden to bear and doesn't directly affect her
she conveyed clear unease with the prospect of his early release
Chiquis has been vocal about her complex relationship with her father and her path to forgiveness
she's openly discussed the difficult but ultimately healing decision to reconcile with Marín
forgiving him was essential for her inner peace
She described the visit as a significant personal milestone
expressing pride in her courage to face him and in the change she perceived in him
A post shared by instagram
"I'm proud of myself for having the courage to come
something I've wanted for so long," Chiquis shared in her docuseries
showing the vulnerability and strength that have marked her public life
questions linger over Marín's potential early release
While no official decision has been confirmed
the possibility alone has stirred strong emotions within Jenni Rivera's family and the public who continue to follow this complex story of trauma
Open data lakehouse Starburst has hired fresh leadership for its key growth markets
Starburst is positioning itself to solve the problem of an ever increasing volume of enterprise data siloed across different platforms, including machine learning and AI data
The company says its Trino platform can help such businesses with data management
Customers on Starburst’s books include the likes of Halliburton
the provider has now appointed Deron Miller as senior vice president and general manager for the Americas and Asia-Pacific regions
and has also brought in Steve Williamson as SVP and general manager for the EMEA region
Prior to Starburst, Deron served as chief revenue officer for Delphix
an enterprise data company acquired by Perforce Software
He has also held revenue leadership roles at GE
“Starburst is one of the most exciting technologies that I have seen in over 20 years,” beamed Miller
“By enabling our customers to access data directly across different platforms
Williamson served as general manager of EMEA at Apptio
He has also served in executive roles at Acquia
Williamson said: “European enterprises have to navigate challenges in data privacy
whilst keeping up with the dynamic needs of the market
Starburst offers Trino and Apache Iceberg to simplify access to enterprise data to drive faster revenue growth
Earlier this year, Starburst appointed Steven Chung as president
and Adam Ferrari as senior vice president of engineering
information and analysis site covering storage media
devices from drives through arrays to server-based storage
opinions and analysis of storage industry events and developments
analysts and everyone else involved in IT storage up to date with what’s going on
which supports emerging artists and fashion designers (recent protégées include Standing Ground’s Michael Stewart and Aaron Esh) – Verkade
It’s also no coincidence that Verkade settled on the market-stall-lined Mouassine district
and it’s just a few steps from the renowned El Fenn Hotel (Madonna hired out its entire 42 rooms for her 60th birthday in 2018) whose owners
first introduced Verkade to Marrakech more than 15 years ago
The silk Hermès two-piece and surrealist gold Schiaparelli necklace she’s wearing on the day I visit would indeed suit an evening on the hotel’s rooftop
a pair of self-portraits by Shirin Fathi hang beside a tadelakt fireplace
We dart down a narrow passageway off the main thoroughfare and
With most of London’s fashion scene on speed dial
it’s only natural that the brass plaque next to a discreet front door was the brainchild of Verkade’s old pal
the entrance hall is home to a hand-tiled sandstone herringbone staircase with black and gold art deco edging
A wavy hand-forged wrought iron grill climbs the entirety of the stairwell
one of the few details she has retained from the home’s previous owners
“The whole place felt completely outdated and in need of some serious love
ArrowThe heart of the house is undoubtedly its quintessentially Moorish courtyard – a tranquil sanctum shaded by banana trees and palms
The fragrance of the honeysuckles that hang over an inviting plunge pool fills the air
“I call it the cocktail pool,” Verkade says
and we take a seat on rattan chairs (a local flea market discovery) to sip gin and tonics
“I bought them for 50 euros and painted them black,” she says
but I think they came out pretty good.” Throughout the eight-room riad
you’ll find repurposed treasures that signal Verkade’s unpretentious glamour
there are bespoke artisanal pieces by international artists who have come through Sarabande
two self-portraits by Iranian Canadian artist Shirin Fathi hang on either side of a black Moroccan plaster tadelakt fireplace; a sculpture by Matija Čop
who has a 2024 residency at Sarabande’s east London studios
A hand-carved bobbin sofa wraps invitingly round the room
commissioned from a local furniture-maker and upholstered in silk chenille from Italy
while below there’s a Berber rug in abstract colours
there’s a traditional plasterwork lattice ceiling and low-slung
stripe-covered banquettes strewn with cushions made using fabrics from Rogers & Goffigon
who work with some of the finest weavers in Europe
leafy courtyard is looked over on four sides
Trino discovered the rattan chair at a flea market and painted it black
“What I really love about the riad is that you can walk around all its four sides and overlook the courtyard,” Verkade says
as most are built next to another building with a wall on one side.” Upstairs
we pause on a balcony overlooking the courtyard
before throwing open a set of intricately carved black and gold doors
made locally in the city using a historic Syrian technique
which guard her bedroom (one of the house’s four suites)
The walls have been stained a beguiling rose pink
Below our feet: a hand-stitched leather floor crafted by a local artisan Trino fondly calls “Mr Magic”
“It was left over from an old Alexander McQueen show,” she says
We enter the “yellow room” (my favourite of the bedrooms), which features seductive ochre tadelakt walls and a chalky black floor. The scalloped bed frame is nothing short of a masterpiece. “I bought it in 2012 when I first started working at Thom Browne in New York and I had it upholstered in an Alexander McQueen fabric,” Verkade says
The black and white embroidered bedspread was made by three local women and took four weeks to complete by hand
The verdant courtyard glimpsed through an archway
As we head up the final flight of tiled stairs to the roof terrace
it strikes me that Trino is a true creative in her own right
As well as being the CEO of a pioneering arts charity
The terrace is perfectly overwhelmed with mature fruit trees and a shaded seating area with its own fireplace and fully fledged green-tiled kitchen
“I always say to my artists that if you were to throw a dart into the future and visualise where you want to land
Marrakech would be it for me,” Verkade says
her signature burnished red hair aflame in the Moroccan evening sun
It’s certainly an incredible place to land
Met Gala 2025 Red Carpet Looks: See Every Celebrity Outfit and Dress
Everything You Need To Know About The Met Gala 2025
Join The Vogue Newsletter For The Latest Fashion, Beauty And Street Style Trends Straight To Your Inbox
The Key Spring/Summer 2025 Trends To Know Now
Join British Vogue’s Met Gala Community Ahead Of Fashion’s Biggest Night. This link redirects to a third-party website.
Verkade was Lee Alexander McQueen’s first hire in 1994
As she puts more than 50 key pieces from her archive up for auction
she tells Joe Bromley the true stories behind them
Fashion
Kim Kardashian, Lady Gaga and Zendaya might well have their trigger fingers ready — one of the largest and most rarefied personal collections of Lee Alexander McQueen’s work today goes under the hammer in London
The more than 50 lots going at Kerry Taylor Auctions
many of which were whisked straight off the catwalk and have been wrapped in acid-proof paper and kept in boxes ever since
make up The Trino Verkade Collection — the wardrobe of McQueen’s first ever employee
Verkade is a London fashion legend. Today, I find her, with her lipstick-rouge locks cut into a stern fringe, perched on a stone-grey sofa beside two taxidermy birds, at the top of her Sarabande Foundation studios
She started her career as McQueen’s PR (“but it was very clear the role was: you do everything,” she says) in 1994
and worked with him until his death in 2010
she has succeeded in growing it into a vital organ for London’s artistic support system
Insider refer to her as a “fairy godmother” for blossoming talents looking for leg up
On a sunny afternoon at their Haggerston plot — Sarabande doubled in size
opening its Tottenham outpost in 2022 — the 30 studios are humming with art school energy
is photographing a lookbook with Lennon Gallagher upstairs
Dean Hoy is deconstructing teddy bears and inverting them into sculptures and Matija Čop has crafted orgasm audios into hanging sculptures
traditional oil painter or stained sugar-sculptor to be met at every turn
we wouldn't be able to help,” says Verkade
“My life has changed — helping the artists in my life now.” This week
Haggerston will be taken over by their What Now series; a kind of chic careers fair (less Goldman Sachs scouts
Stella McCartney and the Tate) run for free for recent graduates
“I used to wear Alexander McQueen pieces all the time,” she continues
But I just don't have the lifestyle for evening gowns and power suits today
I’m not trying to show power to the artists.”
“I first met Lee in 1994 through [stylist] Katy England
So I assisted in The Birds [SS95] – it was the spare tyre on my old Fiat Panda that we used to roll the tyre marks on the models
Highland rape was the second show that I did
I was just out of college and thought nothing of it
I remember heading to a floristry on Sloane Street that supplied all the heather and flowers — the only thing that we could do to dress up the catwalk in the tents in those days was to literally scatter it with heather to make it feel like the Scottish Highlands
then there’s a beautiful purple suede bumster skirt — one of the first he ever made
“I remember Lee starting at Givenchy very well — I did the contract for him
I was in the basement to the office where the studios were and we got a phone call from Louis Vuitton
We thought they wanted a meeting for him to make a handbag – so we went along to Mayfair
and they said: ‘we’d like you to do Givenchy’
because with it Lee was doing 10 collections a year
But he learnt stuff there — he loved the ability to create things without worrying about money
This is a cocktail dress from his first Haute Couture collection for Givenchy
I remember him being really upset because everybody said ‘Oh
He said the whole point of Haute Couture is you can have any colour you like
He wanted to show a blank palette with the cuts and the concept
“I don't think it was ever very clear what my job spec was — it was just clear you do everything
I let him do the design because he was so much better than I was
I remember leaving him at the pattern cutting table
and when I would come in in the morning and he would have created something start to finish overnight
“This is the runway piece; the green chiffon dress from The Man Who Knew Too Much
wear this”’ I remember it was long on the runway
but he cut it short for me before we went out
He would also make me dye my hair to match my dresses — not green
It was fun — I was going along with that.”
it was such an emotional time [because the collection was shown posthumously
Lee had already worked on the pieces that were shown
and the thinking was: Don't add anything else
I think everybody really wanted something from it
they are works of art — but I always wanted to wear the pieces
They were designed to make a woman feel strong
He wanted women to stride into a room and for men to fear them.”
kerrytaylorauctions.com
Please enable JS and disable any ad blocker
You can register a free account and get 10 FREE premium articles a month
please enter your email address in the box below and we will send you an email with a few steps to reset your password and get you back onto your account
I often say every one of us has a story to tell
I believe that after almost six and a half years since Alice’s Table began I have proven this
The stories of where we have come from are incredibly fascinating – our connections from a common past we share both rich and revealing of..
Register free to get 10 premium articles/month
Already a subscriber?
Read our latest newspaper by downloading our app from the link below
The Gibraltar Chronicle is a daily newspaper published in Gibraltar since 1801
It is one of the world's oldest English language newspapers to have been in print continuously
Our print edition and e-paper is published daily except Sundays
The Gibraltar Chronicle (Newspaper) Ltd is licensed by the Gibraltar Government's Office of Fair Trading
This website uses cookies to improve your experience while you navigate through the website
the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website
We also use third-party cookies that help us analyze and understand how you use this website
These cookies will be stored in your browser only with your consent
You also have the option to opt-out of these cookies
But opting out of some of these cookies may have an effect on your browsing experience
Necessary cookies are absolutely essential for the website to function properly
This category only includes cookies that ensures basic functionalities and security features of the website
These cookies do not store any personal information
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics
other embedded contents are termed as non-necessary cookies
It is mandatory to procure user consent prior to running these cookies on your website
Trino Motion Pictures is set to captivate audiences with ‘Love Lockdown’
a riveting romantic drama that explores the complexities of love
whose carefully crafted life is thrown into chaos when a COVID lockdown traps him with an old flame
a chance encounter reignites unresolved emotions and forces Yemi to grapple with the meaning of commitment and the choices that define us
Set against the backdrop of the global pandemic
the emotionally charged film explores themes of trust
At its heart lies the central question: Do we marry the one who feels right for us
Read also: Genre-Bending Thriller- ‘The Weekend’ set for the big screen in August
an award winning producer and directed by Lyndsey Efejuku
Love Lockdown features stellar performances from Andrew Yaw Bunting
who deliver a layered narrative that balances raw emotional intensity with a relatable premise
The film promises to leave audiences reflecting on the intricacies of love long after the credits roll
The movie is also a testament to a legacy of excellence in storytelling by Trino Motion Pictures
which includes celebrated titles such as Silva
Known for producing high-quality African films that resonate with audiences across the continent
Trino Motion Pictures continues to push boundaries with stories that are both entertaining and thought-provoking
When you use Trino on Amazon EMR or Athena
you get the latest open source community innovations along with proprietary
Starting from Amazon EMR 6.8.0 and Athena engine version 2, AWS has been developing query plan and engine behavior optimizations that improve query performance on Trino. In this post, we compare Amazon EMR 6.15.0 with open source Trino 426 and show that TPC-DS queries ran up to 2.7 times faster on Amazon EMR 6.15.0 Trino 426 compared to open source Trino 426
we explain a few of the AWS-developed performance optimizations that contribute to these results
Benchmark queries were run sequentially on two different Amazon EMR 6.15.0 clusters: one with Amazon EMR Trino 426 and the other with open source Trino 426
Both clusters used 1 r5.4xlarge coordinator and 20 r5.4xlarge worker instances
Our benchmarks show consistently better performance with Trino on Amazon EMR 6.15.0 compared to open source Trino
The total query runtime of Trino on Amazon EMR was 2.7 times faster compared to open source
The following graph shows performance improvements measured by the total query runtime (in seconds) for the benchmark queries
Many of the TPC-DS queries demonstrated performance gains over five times faster compared to open source Trino
Some queries showed even greater performance
The following graph shows the top 10 TPC-DS queries with the largest improvement in runtime. For succinct representation and to avoid skewness of performance improvements in the graph
Now that we understand the performance gains with Trino on Amazon EMR
let’s delve deeper into some of the key innovations developed by AWS engineering that contribute to these improvements
Choosing a better join order and join type is critical to better query performance because it can affect how much data is read from a particular table
how much data is transferred to the intermediate stages through the network
and how much memory is needed to build up a hash table to facilitate a join
Join order and join algorithm decisions are typically a function performed by cost-based optimizers
which uses statistics to improve query plans by deciding how tables and subqueries are joined
or too expensive to collect on large tables
Amazon EMR and Athena use S3 file metadata to optimize query plans
S3 file metadata is used to infer small subqueries and tables in the query while determining the join order or join type
The syntactical join order is store_sales joins store_returns joins call_center
With the Amazon EMR join type and order selection optimization rules
optimal join order is determined even if these tables don’t have statistics
For the preceding query if call_center is considered a small table after estimating the approximate size through S3 file metadata
EMR’s join optimization rules will join store_sales with call_center first and convert the join to a broadcast join
speeding-up the query and reducing memory consumption
Join reordering minimizes the intermediate result size
which helps to further reduce the overall query runtime
In our mission to innovate on behalf of customers, Amazon EMR and Athena frequently release performance and reliability enhancements on their latest versions. Check the Amazon EMR and Amazon Athena release pages to learn about new features and enhancements
Bhargavi Sagi is a Software Development Engineer on Amazon Athena
She joined AWS in 2020 and has been working on different areas of Amazon EMR and Athena engine V3
Sushil Kumar Shivashankar is the Engineering Manager for EMR Trino and Athena Query Engine team
He has been focusing in the big data analytics space since 2014
Spot Instances are best suited for running stateless and fault-tolerant big data applications such as Apache Spark with Amazon EMR
which are resilient against Spot node interruptions
without the need for complex and expensive processes of copying the data to a single location
Before Project Tardigrade
Trino queries failed whenever any of the nodes in Trino clusters failed
and there was no automatic retry mechanism with iterative querying capability
failed queries had to be restarted from scratch
the cost of failures of long-running extract
and load (ETL) and batch queries on Trino was high in terms of completion time
Spot Instances were not appropriate for long-running queries with Trino clusters and only suited for short-lived Trino queries
Enabling this feature mitigates Trino task failures caused by worker node failures due to Spot interruptions or On-Demand node stops
Trino now retries failed tasks using intermediate exchange data checkpointed on Amazon S3 or HDFS
Trino runs a query by breaking up the run into a hierarchy of stages, which are implemented as a series of tasks distributed over a network of Trino workers
This pipelined execution model runs multiple stages in parallel and streams data from one stage to another as the data becomes available
This parallel architecture reduces end-to-end latency and makes Trino a fast tool for ad hoc data exploration and ETL jobs over very large datasets
The following diagram illustrates this architecture
You can save significant costs for your ETL and batch workloads running on EMR Trino clusters with a blend of Spot and On-Demand Instances
You can also reduce time-to-insight with faster query runs with lower costs by running more worker nodes on Spot Instances
a long-running query on EMR Trino that takes an hour can be finished faster by provisioning more worker nodes on Spot Instances
Fault-tolerant execution in Trino is disabled by default; you can enable it by setting a retry policy in the Amazon EMR configuration
Trino supports two types of retry policies:
which is spilled beyond in-memory buffer size of worker nodes
Amazon EMR release 6.9.0 and later uses HDFS as an exchange manager
we create an EMR cluster with following architecture
We provision the following resources using Amazon EMR and AWS FIS:
We use the new Amazon EMR console to create an EMR 6.9.0 cluster. For more information about the new console, refer to Summary of differences
Complete the following steps to create your EMR cluster:
We need Hue’s web-based interface for submitting SQL queries to the Trino engine and HDFS on core nodes to store intermediate exchange data for Trino’s fault-tolerant runs
the number of vCPUs of the EC2 instance type are used as the count towards the total target capacity of a core or task fleet by default
an m5.xlarge instance type with 4 vCPUs is considered as 4 units of capacity by default
so no sizing configuration is needed or available for the primary node on the Amazon EMR console
you can create a JSON config file with the configuration
and select the file path from its S3 location by selecting Load JSON from Amazon S3
Let’s understand some optional settings for query performance optimization that we have configured:
For more details of Trino’s fault-tolerant configuration parameters, refer to Fault-tolerant execution
We’ll use this tag to target Spot Instances in the cluster with AWS FIS
The EMR cluster will take few minutes to be ready in the Waiting state
We now use the AWS FIS console to simulate interruptions of Spot Instances in the EMR Trino cluster and showcase the fault-tolerance of the Trino engine
This means FIS will interrupt targeted Spot Instances after 2 minutes of running the experiment
When your EMR cluster is in the Waiting state
connect to the Hue web interface for Trino queries and the Trino web interface for monitoring
you can submit your Trino queries using trino-cli after connecting via SSH to your EMR cluster’s primary node
we will use the Hue web interface for running queries on the EMR Trino engine
you can see the query editor on Hue’s web interface
Amazon EMR configures the Trino web interface on the Trino coordinator (EMR primary node) to use port 8889
we can see six active Trino workers (two core and four task nodes of EMR cluster) and no running queries
select * from system.runtime.nodes from the Hue query editor to see the coordinator and worker nodes’ status and details
We can see all cluster nodes are in the active state
To test the fault tolerance on Spot interruptions
you can see the query running on six active worker nodes (two core On-Demand and four task nodes on Spot Instances)
the experiment will be in the Completed state and it will trigger stopping all four Spot Instances (four Trino workers) after 2 minutes
we can observe in the Trino web UI that we have lost four Trino workers (task nodes running on Spot Instances) but the query is still running with the two remaining On-Demand worker nodes (core nodes)
Without the fault-tolerant configuration in EMR Trino
the whole query would fail with even a single worker node failure
We can see four Spot worker nodes with the status shutting_down
Trino starts shutting down the four Spot worker nodes as soon as they receive the 2-minute Spot interruption notification sent by the AWS FIS experiment
It will start retrying any failed tasks of these four Spot workers on the remaining active workers (two core nodes) of the cluster
The Trino engine will also not schedule tasks of any new queries on Spot worker nodes in the shutting_down state
The Trino query will keep running on the remaining two worker nodes and succeed despite the interruption of the four Spot worker nodes
Amazon EMR will replenish the stopped capacity (four task nodes) by launching four replacement Spot nodes
Now let’s increase Trino workers capacity from 6 to 10 nodes by manually resizing EMR task nodes on Spot Instances (from 4 to 8 nodes)
We run the same query on a larger cluster with 10 Trino workers
Let’s compare the query completion time (wall time in the Trino Web UI) with the earlier smaller cluster with six workers
We can see 32% faster query performance (1.57 minutes vs
You can run more Trino workers on Spot Instances to run queries faster to meet your SLAs or process a larger number of queries
With Spot Instances available at discounts up to 90% off On-Demand prices
your cluster costs will not increase significantly vs
running the whole compute capacity on On-Demand Instances
navigate to the Amazon EMR console and delete the cluster emr-trino-cluster
we showed how you can configure and launch EMR clusters with the Trino engine using its fault-tolerant configuration
Trino worker nodes can be run as EMR task nodes on Spot Instances with resilience
You can configure a well-diversified task fleet with multiple instance types using the price-capacity optimized allocation strategy
This will make Amazon EMR request and launch task nodes from the most available
lower-priced Spot capacity pools to minimize costs
We also demonstrated the resilience of EMR Trino against Spot interruptions using an AWS FIS Spot interruption experiment
EMR Trino continues to run queries by retrying failed tasks on remaining available worker nodes in the event of any Spot node interruption
With fault-tolerant EMR Trino and Spot Instances
you can run big data queries with resilience
you can also add more compute on Spot to adhere to or exceed your SLAs for faster query performance with lower costs compared to On-Demand Instances
Ashwini Kumar is a Senior Specialist Solutions Architect at AWS based in Delhi
Ashwini has more than 18 years of industry experience in systems integration
with more recent experience in cloud architecture
He helps customers optimize their cloud spend
He focuses on architectural best practices for various workloads with services including EC2 Spot
Dipayan Sarkar is a Specialist Solutions Architect for Analytics at AWS
where he helps customers modernize their data platform using AWS Analytics services
He works with customers to design and build analytics solutions
enabling businesses to make data-driven decisions
The project is complemented by a 25MW lithium-ion BESS with a storage capacity of 100 megawatt hours
Enel Green Power has begun operations at a solar photovoltaic (PV) plant in northern Italy with an installed capacity of 87MW
a municipality in the province of Vercelli
the facility is equipped with 160,000 PV panels
It is expected to generate 130 gigawatt hours of electricity annually
meeting the energy needs of 47,000 households
The plant is expected to offset the emission of 56,000t of CO₂ each year
thereby contributing to reduced environmental impact and advancements in Italy’s renewable energy sector
The plant’s advanced bifacial photovoltaic modules are designed to optimise renewable energy generation
The project is complemented by a 25MW lithium-ion battery energy storage system (BESS)
which has a storage capacity of 100 megawatt hours
Don’t let policy changes catch you off guard
Stay proactive with real-time data and expert analysis
Enel Green Power also disclosed plans for an even larger storage system at the Trino site
The initiative is part of a broader investment strategy
with the group allocating more than €12bn globally up to 2026 to bolster renewable energy development
The Trino photovoltaic plant project received local support through the “Scelta Rinnovabile” crowdfunding campaign
The company stated: “Thanks to widespread participation
the fundraising target was met and vastly exceeded
with the final amount raised being 150% of the initial target
the local residents involved in the initiative will begin to recoup their investment.”
Enel Green Power also revealed that its global emission-free electricity generation achieved a new high of 82% in the first quarter of 2024
Give your business an edge with our leading industry insights
View all newsletters from across the GlobalData Media network
The Sarabande CEO talks early days in London and expanding the program to support young artists.
From the early '90s and through his untimely death in 2010, McQueen and Verkade were a powerhouse duo
If he was the brain's creative right hemisphere
it's like it happened to somebody else," she says
her fiery red tresses brightening up the video chat
"What an incredible journey I was able to help with."
Her storied career is the result of equal parts creative tenacity and keen understanding of the fashion business
Read on to learn how Verkade went from being Alexander McQueen's first employee to steering Sarabande's philanthropy.
Photo: Sølve Sundsbø/Courtesy of Sarabande Foundation
What was your relationship like with fashion growing up
I grew up in Liverpool, so I used to go to the nightclub as a young person and would make myself clothes to wear. Everything was secondhand
just like it is now — I think we've kind of come full circle
we went to thrift shops and bought 1940 suits and wore them with something modern
That was the only way that you could really find fashion: to customize and make things yourself.
I eventually moved to London to go and study
It was a different world from how I'd grown up
where fashion was very thrift and very make-it-yourself
I started to work for a very unknown young designer at the time named Alexander McQueen
because that was in the early '90s: If you went into fashion
it wasn't a career as such — it wasn't like you're going to be rich and famous
How did you navigate the industry back then
I didn't know what I wanted to do [after college]
and I think that's something super important
to know what you are good at and what you excel in
and I also did fashion design — but I was not as good as Alexander McQueen
But I could do my job better than him because we all have a skill
I loved working with somebody with really crazy ideas
but give me somebody who's going a bit crazy
and I love it because I can build a structure around that
That made the good combination with me and Lee
you shouldn't do that.' I'd be like
I love the idea of building order in chaos
I originally did PR after leaving college, and I also did a bit of styling with Katy England, which is how I met Lee. I tried lots of different jobs
I did my time working with lots of different people
When you work for a really small team and a really young designer
the idea of having just one role [pretty much doesn't exist]. You're doing PR
and you're speaking about maybe getting a sponsor to give you some tartan to make an outfit
That's got as much to do with being open and just getting on
when jobs are changing very quickly and the world is changing very quickly
it's almost more important not what you know
but what you're willing to learn very quickly
What have been your biggest career milestones
That was 16 years of amazingly hard work and really enjoying most of it
I also loved working with Thom [Browne] — that was a very special moment for me because it was something that I threw myself into: a strange country with a team I didn't know
They became like a second family to me.
I did the Met show ["Alexander McQueen: Savage Beauty"] with Andrew Bolton
I'd never worked on an exhibition of that level
I was guided by an amazing man like Andrew.
it's so fulfilling to work with such amazing creatives and to be so close to them all the time
It's great to be around people who just keep on creating and still believe that there's something new to be done.
You've spoken about how Lee wanted to support creatives financially based on how that majorly helped his own career
What was it like to start Sarabande and navigate how to memorialize a friend while moving the mission forward
The one thing I did know is that we had to do scholarships
that scholarships — because he'd been helped out by his aunt — are important to make sure that people have the opportunity to study who couldn't go to college. We want everybody to have the right to go to college if they can
not just people with the financial means..
We need people from all different backgrounds to be in our pipeline. We give a lot of scholarships to people from all countries [to] study in the U.K.
in two courses: fashion at Central Saint Martins and art at Slade School of Fine Art.
The idea was to create something like the '90s
which was creators from all different backgrounds of art
photography — people who just made really different things and all worked together and had a real respect for each other..
That's the main concept of the studios: You can look at the person next door to you
and not only do they inspire you with their work
but perhaps they have a different approach to how they market that work or how they want to be seen
You can take something from that and realize there's not just one pathway with what you want to do.
it's eye-wateringly expensive to rent any space right now
Giving a financially accessible space is really important
A snapshot of one-on-one mentorship at Sarabande's What Now
I feel like it does something incredibly nourishing
those collectives historically have always been a home base for building who you become and who you are
Having a studio next door to you with somebody who's doing their own thing and being able to just have a coffee with them and chat about it..
We do these fun things called 'artist crits'
because so many artists say they have no one to talk to about their work
Everybody comes to bring a piece of work and talk to a other artists about it.
[Our artists are] working together and collaborating all the time. Right now, one is taking a showroom in Paris and asking all the other designers if they want to come and share it
Instead of waiting for a big organization to take a commission
they're doing it together through the foundation
We have a mantra: 'Once a Sarabande artist
always a Sarabande artist.' It doesn't matter if you were an artist here four years ago
If you want to borrow the photography equipment
if you need help with a contract — we never never say
your time passed.' It's your time always
We're interested in it not just being in London
We want to live in that world where it's all different voices with all different backgrounds
What what makes you believe in an artist's vision
There's an endless amount of people coming out with amazing ideas that don't have to be commercial
That's not the world that we need to live in
We need people to dream and show that there are truly artistic approaches to their practice
It's not just about making something that people want to buy — sometimes it's just something that you want to look at and go
I never thought the world could look through somebody's eyes that way,' and that might inspire somebody else to look through in a similar lens
A real dedication to the craft is really important.
What have been some of the biggest lessons you've learned from the artists at Sarabande and from building the foundation
It's really hard when you're creating stuff
yet the stores don't buy it — still doing it because you believe in it is a really strong
We just reduced all of the ticket prices in London for talks
but we'll work that out because that's the right thing to do. Do the right thing first
It seems a bit unusual of an approach for the business of a charity
What were some of the biggest lessons that you learned when you were working with Lee that you hold to this day
and then he wouldn't be afraid of changing his mind
but the concept of failure was only in not trying
Is there a best piece of advice you've received or something that you've learned that that's stayed with you
Not taking shortcuts and doing something that you truly believe in is really important
Not expecting things to happen in a certain timeframe. Allowing time to learn and to do things. Time is one of your best friends
This interview has been edited and condensed for clarity
Want the latest fashion industry news first? Sign up for our daily newsletter.
By Andrea BossiAndrea Bossi
the importance of self-love and how to build sustainable talent pipelines
from screen printing T-shirts in high school to becoming a 2023 CFDA Award nominee
Managing data through a central data platform simplifies staffing and training challenges and reduces the costs
because central teams may not understand the specific needs of a data domain
whether it’s because of data types and storage
or specific technologies needed for data processing
One of the architecture patterns that has emerged recently to tackle this challenge is the data mesh architecture
which gives ownership and autonomy to individual teams who own the data
One of the major components of implementing a data mesh architecture lies in enabling federated governance
which includes centralized authorization and audits
Apache Ranger is an open-source project that provides authorization and audit capabilities for Hadoop and related big data applications like Apache Hive
Trino, on the other hand, is a highly parallel and distributed query engine, and provides federated access to data by using connectors to multiple backend systems like Hive, Amazon Redshift, and Amazon OpenSearch Service
Trino acts as a single access point to query all data sources
By combining Trino query federation features with the authorization and audit capability of Apache Ranger, you can enable federated governance. This allows multiple purpose-built data engines to function as one
with a single centralized place to manage data access controls
This post shares details on how to architect this solution using the new EMR Ranger Trino plugin on Amazon EMR 6.7
Trino allows you to query data in different sources
This feature enables you to have a single point of entry for all data sources that can be queried through SQL
The following diagram illustrates the high-level overview of the architecture
This architecture is based on four major components:
Before getting started, you must have the following prerequisites. For more information, refer to the Prerequisites and Setting up your resources sections in Introducing Amazon EMR integration with Apache Ranger
To set up the new Apache Ranger Trino plugin
it’s used to impersonate SQL calls submitted by AD users
Warning: Impersonation feature should always be used carefully to avoid giving any/all users access to high privileges
create the file redshift.properties under /etc/trino/conf.dist/catalog on all Amazon EMR nodes and restart the Trino server
we go through an example where the data is distributed across Amazon Redshift for dimension tables and Hive for fact tables
We can use Trino to join data between these two engines
let’s define a new dimension table called Products and load it with data:
Then use the Hue UI to create the Hive external table Orders:
Now let’s use Trino to join both datasets:
The following screenshot shows our results
Apache Ranger supports policies to allow or deny access based on several attributes
as well as dynamic attributes like IP address and time of access
the model supports authorization based on the classification of the resources such as like PII
Another feature is the ability to allow users to access only a subset of rows in a table or restrict users to access only masked or redacted values of sensitive data
Examples of this include the ability to restrict users to access only records of customers located in the same country where the user works
or allow a user who is doctor to see only records of patients that are associated with that doctor
you can enable row filtering and column masking of data in Amazon Redshift tables
The example policy masks the firstname column
and applies a filter condition on the city column to restrict users to view rows for a specific city only
When using this solution, keep in mind the following limitations, further details can be found here:
If you can’t log in to the EMR cluster’s node as an AD user
and you get the error message Permission denied
This can happen if the SSSD service has stopped on the node you are trying to access. To fix this, connect to the node using the configured SSH key-pair or by making use of Session Manager and run the following command
If you’re unable to download policies from Ranger admin server
and you get the error message Error getting policies with the HTTP status code 400
This can be caused because either the certificate has expired or the Ranger policy definition is not set up correctly
it’s likely that the certificates have expired
You will need to perform the following steps to address the issue
The issue can also be caused due to a misconfigured Ranger policy definition
The Ranger admin service policy definition should trust the self-signed certificate chain
Make sure the following configuration attribute for the service definitions has the correct domain name or pattern to match the domain name used for your EMR cluster nodes
If the EMR cluster keeps failing with the error message Terminated with errors: An internal error occurred
check the Amazon EMR primary node secret agent logs
the cluster is failing because the specified CloudWatch log group doesn’t exist:
A query run through trino-cli might fail with the error Unable to obtain password from user
This issue can occur due to incorrect realm names in the etc/trino/conf.dist/catalog/hive.properties file
Check the domain or realm name and other Kerberos related configs in the etc/trino/conf.dist/catalog/hive.properties file
also check the /etc/trino/conf.dist/trino-env.sh and /etc/trino/conf.dist/config.properties files in case any config changes has been made
Clean up the resources created either manually or by the AWS CloudFormation template provided in GitHub repo to avoid unnecessary cost to your AWS account
You can delete the CloudFormation stack by selecting the stack on the AWS CloudFormation console and choosing Delete
This action deletes all the resources it provisioned
If you manually updated a template-provisioned resource
you may encounter some issues during cleanup; you need to clean these up independently
A data mesh approach encourages the idea of data domains where each domain team owns their data and is responsible for data quality and accuracy
This draws parallels with a microservices architecture
Building federated data governance like we show in this post is at the core of implementing a data mesh architecture
Combining the powerful query federation capabilities of Apache Trino with the centralized authorization and audit capabilities of Apache Ranger provides an end-to-end solution to operate and govern a data mesh platform
In addition to the already available Ranger integrations capabilities for Apache SparkSQL, Amazon S3, and Apache Hive, starting from 6.7 release, Amazon EMR includes plugins for Ranger Trino integrations. For more information, refer to EMR Trino Ranger plugin
Varun Rao Bhamidimarri is a Sr Manager
AWS Analytics Specialist Solutions Architect team
His focus is helping customers with adoption of cloud-enabled analytics solutions to meet their business requirements
he loves spending time with his wife and two kids
mediate and recently picked up gardening during the lockdown
Partha Sarathi Sahoo is an Analytics Specialist TAM – at AWS based in Sydney
He brings 15+ years of technology expertise and helps Enterprise customers optimize Analytics workloads
He has extensively worked on both on-premise and cloud Bigdata workloads along with various ETL platform in his previous roles
He also actively works on conducting proactive operational reviews around the Analytics services like Amazon EMR
Anis Harfouche is a Data Architect at AWS Professional Services
He helps customers achieving their business outcomes by designing
building and deploying data solutions based on AWS services
Zomato is an India-based restaurant aggregator
dining-out company with over 350,000 listed restaurants across more than 1,000 cities in India
The company relies heavily on data analytics to enrich the customer experience and improve business efficiency
Zomato’s engineering and product teams use data insights to refine their platform’s restaurant and cuisine recommendations
improve the accuracy of waiting times at restaurants
speed up the matching of delivery partners and improve overall food delivery process
different teams have different requirements for data discovery based upon their business functions
number of orders placed in specific area required by a city lead team
queries resolved per minute required by customer support team or most searched dishes on special events or days by marketing and other teams
Zomato’s Data Platform team is responsible for building and maintaining a reliable platform which serves these data insights to all business units
we will walk you through an overview of Trino and Druid
how they fit into the overall Data Platform architecture and migration journey onto AWS Graviton based instances for these workloads
We will also cover challenges faced during migration
business gains in terms of cost savings and better performance along with future plans of Zomato on Graviton adoption for more workloads
Trino is a fast
distributed SQL query engine for querying petabyte scale data
implementing massively parallel processing (MPP) architecture
It was designed as an alternative to tools that query Apache Hadoop Distributed File System (HDFS) using pipelines of MapReduce jobs
but Trino is not limited to querying HDFS only
It has been extended to operate over a multitude of data sources
including Amazon Simple Storage Service (Amazon S3)
traditional relational databases and distributed data stores including Apache Cassandra
it does so by breaking up the execution into a hierarchy of stages
which are implemented as a series of tasks distributed over a network of Trino workers
This reduces end-to-end latency and makes Trino a fast tool for ad hoc data exploration over very large data sets
Trino coordinator is responsible for parsing statements, planning queries, and managing Trino worker nodes. Every Trino installation must have a coordinator alongside one or more Trino workers. Client applications including Apache Superset and Redash connect to the coordinator via Presto Gateway to submit statements for execution
The coordinator creates a logical model of a query involving a series of stages
which is then translated into a series of connected tasks running on a cluster of Trino workers
Presto Gateway acts as a proxy/load-balancer for multiple Trino clusters
Figure 3 – Zomato’s Data Platform landscape on AWS
Zomato’s Data Platform covers data ingestion
distributed processing (enrichment and enhancement)
batch and real-time data pipelines unification and a robust consumption layer
through which petabytes of data is queried daily for ad-hoc and near real-time analytics
we will explain the data flow of pipelines serving data to Trino and Druid clusters in the overall Data Platform architecture
Data Pipeline-1: Amazon Aurora MySQL-Compatible database is used to store data by various microservices at Zomato. Apache Sqoop on Amazon EMR run Extract
Load (ETL) jobs at scheduled intervals to fetch data from Aurora MySQL-Compatible to transfer it to Amazon S3 in the Optimized Row Columnar (ORC) format
performs data enrichment and transformation and writes it in ORC format in Iceberg tables on Amazon S3
Trino clusters then query data from Amazon S3
performs transformations including conversion into ORC format and writes data back to Amazon S3 which is used by Trino clusters for querying
Data Pipeline-4: Zomato’s core business applications serving end users include microservices
To get near real-time insights from these core applications is critical to serve customers and win their trust continuously
Services use a custom SDK developed by data platform team to publish events to the Apache Kafka topic
two downstream data pipelines consume these application events available on Kafka via Apache Flink on Amazon EMR
Flink performs data conversion into ORC format and publishes data to Amazon S3 and in a parallel data pipeline
Flink also publishes enriched data onto another Kafka topic
which further serves data to an Apache Druid cluster deployed on Amazon EC2 instances
All of the described data pipelines ingest data into an Amazon S3 based data lake
which is then leveraged by three types of Trino clusters – Ad-hoc clusters for ad-hoc query use cases
with a maximum query runtime of 20 minutes
ETL clusters for creating materialized views to enhance performance of dashboard queries
and Reporting clusters to run queries for dashboards with various Key Performance Indicators (KPIs)
ETL queries are run via Apache Airflow with a built-in query retry mechanism and a runtime of up to 3 hours
Druid is used to serve two types of queries: computing aggregated metrics based on recent events and comparing aggregated metrics to historical data
how is a specific metric in the current hour compared to the same last week
the service level objective for Druid query response time ranges from a few milliseconds to a few seconds
Zomato first moved Druid nodes to AWS Graviton based instances in their test cluster environment to determine query performance
Nodes running brokers and middle-managers were moved from R5 to R6g instances and nodes running historicals were migrated from i3 to R6gd instances. Zomato logged real-world queries from their production cluster and replayed them in their test cluster to validate the performance
Zomato saw significant performance gains and reduced cost:
performance was measured using typical business hours (12:00 to 22:00 Hours) load of 14K queries
Figure 4 – Overall Druid query performance (Intel x86-64 vs
query performance improvement on the historical nodes of the Druid cluster are shown here
Figure 5 –Query performance on Druid Historicals (Intel x86-64 vs
Under peak load during business hours (12:00 to 22:00 Hours as shown in the provided graph)
Graviton based instances demonstrated close to linear performance resulting in better query runtime than equivalent Intel x86 based instances
This provided headroom to Zomato to reduce their overall node count in the Druid cluster for serving the same peak load query traffic
Figure 6 – CPU utilization (Intel x86-64 vs
AWS Graviton based instances running Druid in a test environment along with the number
instance types and hourly On-demand prices in the Singapore region is shown here
There are cost savings of ~24% running the same number of Graviton based instances
Druid cluster auto scales in production environment based upon performance metrics
so average cost savings with Graviton based instances are even higher at ~30% due to better performance
Figure 7 – Cost savings analysis (Intel x86-64 vs
Zomato also moved their Trino cluster in their test environment to AWS Graviton based instances and monitored query performance for different short and long-running queries
mean wall (elapsed) time value for different Trino queries is lower on AWS Graviton instances than equivalent Intel x86 based instances
Figure 8 – Mean Wall Time for Trino queries (Intel x86-64 vs
p99 query runtime reduced by ~33% after migrating the Trino cluster to AWS Graviton instances for a typical business day’s (7am – 7pm) mixed query load with ~15K queries
Figure 9 –Query performance for a typical day (7am -7pm) load
Zomato’s team further optimized overall Trino query performance by enhancing Advanced Encryption Standard (AES) performance on Graviton for TLS negotiation with Amazon S3
It was achieved by enabling -XX:+UnlockDiagnosticVMOptions and -XX:+UseAESCTRIntrinsics in extra JVM flags
mean CPU time for queries is lower after enabling extra JVM flags
Figure 10 –Query performance after enabling extra JVM options with Graviton instances
testing and benchmarking the workload on a newer Trino version
Zomato has already migrated AWS managed services including Amazon EMR and Amazon Aurora MySQL-Compatible database to AWS Graviton based instances
With the successful migration of two main open source software components (Trino and Druid) of their data platform to AWS Graviton with visible and immediate price-performance gains
the Zomato team plans to replicate that success with other open source applications running on Amazon EC2 including Apache Kafka
This post demonstrated the price/performance benefits of adopting AWS Graviton based instances for high throughput
near real-time big data analytics workloads running on Java-based
open source Apache Druid and Trino applications
Zomato reduced the cost of its Amazon EC2 usage by 30%
while improving performance for both time-critical and ad-hoc querying by as much as 25%
Zomato was also able to right size compute footprint for these workloads on a smaller number of Amazon EC2 instances
with peak capacity of Apache Druid and Trino clusters reduced by 25% and 20% respectively
Zomato migrated these open source software applications faster by quickly implementing customizations needed for optimum performance and compatibility with Graviton based instances
Zomato’s mission is “better food for more people” and Graviton adoption is helping with this mission by providing a more sustainable
and cost-effective compute platform on AWS
This is certainly a “food for thought” for customers looking forward to improve price-performance and sustainability for their business-critical workloads running on Open Source Software (OSS)
architecture and software design with more recent experience in Cloud architecture
He helps customers to optimize their cloud spend
Ankit is a Solutions Architect at AWS with 10 years of experience
The majority of his experience revolves around building scalable data architectures for data ingestion
He is a polyglot programmer and had designed platforms as consumable data services across the organization using the Big Data tech stack
His expertise includes working upon Design
and Deployment aspects of Event driven applications
Batch and Streaming applications and their integration with ML Algorithms using containers on AWS and help customers to move from ideation to execution
He knows the ins and outs of setting up a variety of Data Platforms at scale at org level and driving innovation with technologies and solutions in the large-scale distributed cloud services and petabyte-scale data processing space
He has around six years of experience and specializes in the designing and implementation of complex platform solutions
Ayush is responsible for ensuring that the platform is designed and optimized to handle the immense scale and complexity of modern data requirements
He is also part of the engineering team that works on ultra-scalable and highly reliable software systems
Rajat Taya is an SDE-III at Zomato's data platform team with over six years of experience
He started as a data scientist and then moved to the data platform and infrastructure team
Rajat's passion for distributed systems and expertise in big data tools like Druid
and Airflow has enabled him to develop efficient and scalable data solutions
He is responsible for designing and developing robust data solutions that can handle massive amounts of data efficiently
Trino creates Mexican comics but sometimes his work needs no translation
Take a two-panel strip featuring "Star Wars’" Han Solo and Chewy in Mexico
On a recent tour he stopped to look at the "Chuy" strip (above) and noted with a smile
"It's just a two-panel joke playing on the funniness that 'Chuy' is a nickname and 'Chewy' is Chewbacca
But Nericcio also noted that Mexico has a rich tradition of sequential art that Trino is a part of
"I guess it would start with José Guadalupe Posada with his printmaking shop in Mexico City," Nericcio explained. "And then moving into the 20th century, you have a cartoonist like Rius
who was known for his left wing satirical revolutionary comics
A collection of Camacho’s work has been gathered for "Trino's World" at the Comic-Con Museum
"It's fantastic because we are a border town," said David Glanzer
spokesperson for Comic-Con International that runs the museum
"We have a lot of people who come up from Tijuana and Mexico to both the museum and Comic-Con
And now to feature an exhibition of a Mexican artist
"Trino’s World" showcases drawings, watercolors, sketches, and objects reflecting a career of more than four decades. Last year Comic-Con gave Trino its Inkpot Award for his contributions to the world of comics
"He's got a very loose and fluid freestyle," Nericcio explained
These are not meticulously planned and drawn panels
One of the pieces in the exhibit is a pen and ink character study (above) of Trino's El Santos, a luchador that he created to pay homage to the real Mexican wrestler El Santo
Nericcio was particularly enamored with the sketch
"The angst is all in the eyes," Nericcio said
"It's the most basic and simple of cartoon renderings
There's almost a power in the pleading in the eyes
You wonder what El Santos is confronting at that moment
It could be his beautiful woman telling him what to do
It could be that he's confronted by the zombies
Take one panel about a soccer match with superheroes
Nericcio translated: "The referee is saying
Who is your captain?' And then Captain America comes running into the field
He's not above a crappy pun to get the punchline."
And although Trino's strips are in Spanish
audiences because a lot of his points of reference are American pop culture from the "Avengers" to "Star Wars" to "Star Trek." Plus he uses the classic tools of cartooning
"There's not any real attempt to render with precision verisimilitude the human character," Nericcio said
Trino's ability to capture classic human expressions
it's without question we're dealing with a comic artist master
There is a lot that Americans can appreciate in Trino's work but Nericcio has a suggestion
"Bring a pal who speaks Spanish because there are a lot of jokes that are kind of inside-Mexy
Currently there are no translations for any of the comics or for the letter Trino wrote thanking the Museum for the exhibit
But one of the joys of this first binational collaboration and exhibit is seeing how our two cultures overlap and what we share
"Comic-Con International has 'International' in our name," Glanzer explained
"We have guests from all over the world and visitors from all over the world
And I think we'd like to see that at the Museum as well
But visitors will see the universality in Trino’s work
"It almost does the work a disservice to call Trino a Mexican cartoonist," Nericcio said
from the cave drawings of Lascaux to today
is just human beings try to leave a little trace of themselves behind
And what he leaves behind are some really funny meditations on the human heart and the human soul."
UPDATE (7/25/2024): Use Amazon Athena, S3 Object Lambda, or client-side filtering to query your data in Amazon S3. Learn more »
Customers building data lakes continue to innovate in the ways that they store and access their data
particularly when they are accessing large amounts of data
and data engineers running queries from open source frameworks like Trino want to accelerate access to their data
which lets them spend more time analysing their data and generating insights
Amazon S3 Select is an Amazon Simple Storage Service (Amazon S3) feature that makes it easy to retrieve specific data from an object using simple SQL expressions without having to retrieve the entire object
Trino is an open source SQL query engine that can be used to run interactive analytics on data stored in Amazon S3
you retrieve only a subset of data from an object
reducing the amount of data returned from Amazon S3 and accelerating query performance
On November 21, 2022, AWS announced its upstream contributions to open source Trino
which improves query performance when accessing CSV and JSON data formats
you can use the S3 Select Pushdown performance enhancements to reduce the amount of data that must be transferred and processed by Trino
S3 Select Pushdown enables Trino to “push down” the computational work of projection operations (for example, selecting a subset of columns) and predicate operations (for example, applying filtering in a WHERE clause ) to Amazon S3
This allows queries to retrieve only the required data from Amazon S3
In this post, we discuss the performance benchmarks on Trino release 397 with S3 Select using TPC-DS-like benchmark queries at 3 TB scale
We show that queries run up to 9x faster as a result of pushing down the computational load of scanning and filtering data to Amazon S3 when compared to using Trino without S3 Select
You enable S3 Select Pushdown using the s3_select_pushdown_enabled Hive session property
or using the hive.s3select-pushdown.enabled configuration property
The session property overrides the config property
which lets you enable or disable S3 Select Pushdown on a per-query basis
To evaluate the performance improvements on Trino with S3 Select
we ran all 99 TPC-DS-like benchmark queries at 3 TB scale on a 33-node r5.16xlarge EMR v 6.8.0 cluster patched with Trino release 397 and all of the data stored on Amazon S3
We ran all queries successfully multiple times with and without S3 Select
we monitored the standard deviations for each query run
The average deviation in query runtimes across all 99 queries was 0.70 seconds with S3 Select and 0.99 seconds without S3 Select
we’ll compare the total aggregate runtime and the total aggregate amount of data processed for all 99 queries in the TPC-DS-like 3 TB CSV (uncompressed) dataset
The first graph shows the total aggregate runtime in seconds with and without S3 Select:
The next graph shows the total aggregate data (in terabytes) processed with and without S3 Select:
we found that S3 Select sped up all 99 queries
The maximum query acceleration with S3 Select was 9.2x
the minimum query acceleration with S3 Select was 1.1x
and the average query acceleration was 2.5x
The following graph shows the query speedup for each of the 99 queries:
we found that S3 Select reduced the amount of bytes processed by Trino for all 99 queries
we saw a reduction of 17 TB (99%) of processed data with S3 Select on Query 9
the average reduction in the amount of processed data per query with S3 Select was 2 TB
and the total reduction in processed data across all 99 queries was 200 TB (21x better) with S3 Select
The following graph shows the reduction of data processed for each of the 99 queries with S3 Select:
The performance results provided in this post required no tuning to Amazon EMR or Trino
and all of the results are from default configurations
The following is the default Trino configuration used with our EMR cluster
let’s look at the enhancements that we made to Trino that contributed to these results
Our contributions to Trino improve how Trino sends requests to Amazon S3 by enhancing its use of S3 Select
There are two contributing factors that accelerate the query runtime when S3 Select is used
S3 Select reduces the number of bytes transferred between Amazon S3 and Trino by pushing down the filtering to Amazon S3
Trino retrieves a pre-filtered subset of Amazon S3 data because filtering and projection is performed by S3 Select
using S3 Select to push down the computation work of filtering to Amazon S3 increases Trino’s ability to parallelise projection and predicate operations
we presented our results from running our TPC-DS-like 3TB scale benchmark
With the S3 Select Pushdown performance optimizations available in Trino release 397 and later
you can run queries faster than before by using Trino with S3 Select to “pushdown” the computational work of projection and predicate operations to Amazon S3
Our benchmark testing demonstrated up to a 9x performance improvement in query runtime (2.5x on average)
and a 21x overall reduction in the total data processed
If you have a data lake built on Amazon S3 and use Trino today
then you can use S3 Select’s filtering capability to quickly and easily run interactive ad-hoc queries
see the Trino release notes to learn about the enhancements to the Hive connector
Boni Bruno is a Principal Architect and Workload Specialist at AWS
He enjoys developing solution-driven architectures and sharing informative content to the AWS Storage and Analytics community
he was the Chief Solutions Architect for Dell Technologies’ Unstructured Data Solutions Division
where he built numerous solutions around big data processing
Eric Henderson is a Principal Product Manager with Amazon S3 focusing on S3’s serverless compute and event-driven technology
He loves building products that solve problems for customers
a group of engineers at Facebook started the Presto project
introducing a new SQL query engine to help the social media giant to scale
After a decade of growth, the technology is more relevant than ever before, providing an open source approach that enables organizations to easily query data wherever it might reside. But it hasn't been a straight line of success for the Presto project
which has experienced both drama and growth over the last decade
In 2018, after the original founders of the Presto project left Facebook, the technology was divided into two separate projects: PrestoDB and PrestoSQL. The division led to two rival software foundations and, in January 2021, to the rebranding of PrestoSQL as Trino
Multiple commercial vendors of the technology have also emerged over the past decade, including Ahana for PrestoDB and Starburst for Trino
The effect Presto has had on the data community over the past decade is not lost on industry analysts
"Trino and Presto helped drive the rise of the query engine
which helps enterprises maintain fast data access even as their environments grow more complicated," said Kevin Petrie
"The query engine uses familiar SQL commands to retrieve data from data stores at low latency and high throughput."
He added that the Presto and Trino query engines also enable enterprises to support business intelligence and other analytics projects on high volumes of data in environments like data lakes
Hyoun Park, CEO and analyst at Amalgam Insights, said that in his view, Presto represented the first scale-out serverless analytics for distributed data when it was introduced. At the time, it opened up the concept of analytic data from the traditional single source of truth to a more open environment for interactive SQL-based querying on a wide variety of data sources
"The ability to do analytics on the data as a concept owes a great deal of gratitude to PrestoSQL and Trino for both popularizing and demonstrating the concept," Park said
From its earliest days, a key goal of the Presto project was to provide a foundational technology that would last a decade or more, according to Dain Sundstrom
"I actually very clearly remember the conversation we had when we were starting this project," Sundstrom said
'Let's try and make this like PostgreSQL,' which is a database we all really love."
They liked PostgreSQL for its open source community and its longevity as a database
The team that built Presto was familiar with existing analytics databases
a decade ago there were no good options for open source analytics databases
Analytics databases didn't work well -- if at all -- with Hadoop and cloud object storage
when Facebook attempted to work with large amounts of data
and we realized we really needed to build something," he said
Trino and Presto have been separate projects
enabling SQL queries to scale in a more reliable approach
Among the new capabilities being developed in the Trino community are polymorphic tables
Sundstrom explained that polymorphic tables provide users with a SQL standard way of embedding complex execution capabilities into the middle of a query
"Polymorphic tables provide new and interesting ways to connect into non-SQL data sources," he said
After 10 years, Sundstrom is satisfied that the project he helped to create is continuing to be impactful and to benefit from the contributions of others in the open source community
"I always want to see more people involved
and I think we're doing a really good job," he said
"We just got large contributions from Bloomberg
LinkedIn and several other companies that use Trino at scale internally."
As part of an effort to enable its employees to make data-informed decisions
Virgin Media O2 is using the GenAI-powered insight ..
The longtime BI vendor is the latest to add generative AI-powered natural language query capabilities aimed at enabling more ..
The new platform features agents addressing needs such as data preparation and natural language-based analysis
Many organizations struggle to manage their vast collection of AWS accounts
There are several important variables within the Amazon EKS pricing model
Dig into the numbers to ensure you deploy the service ..
many organizations lack a suitable versioning strategy
Dropbox search extends to other apps such as Slack
Many organizations require document management software that can automate workflows
Oracle sets its sights on creating a national
anonymized patient database -- a road filled with ..
Oracle plans to acquire Cerner in a deal valued at about $30B
The Supreme Court ruled 6-2 that Java APIs used in Android phones are not subject to American copyright law
SAP showcases new business AI applications and continues to make the case for S/4HANA Cloud as the future of SaaS-based ERP ..
Celonis slaps SAP with a lawsuit over third-party access to data for its process mining applications
The new SAP Business Data Cloud promises to provide customers with a data platform that helps unlock enterprise AI value
Receive emails when new obituariesare published to our website
The Curley Funeral Home is a fourth generation family owned and operated facility
We are here to assist you when a loved one dies at any time
As a family owned business we take pride in personally helping each and every family during this difficult time
From the moment you contact Curley Funeral Home
a caring and dedicated staff member will assist you with all the details of funeral arranging
Since 1897 the Curley Funeral Home has served many communities in the Chicago area
We pride ourselves in offering the highest level of professionalism
honesty and compassion with every family we help.
Your browser may not work with certain site. Upgrade now.
MinIO has devized an exascale DataPOD reference architecture for storing data needed by GPUs doing AI work
The open source object storage software supplier is positioning its scalable 100PiB (112.6PB) unit as an object alternative to parallel file storage systems using GPUDirect to feed data fast to Nvidia’s hungry GPUs – with a nod to Nvidia’s SuperPOD concept
MinIO says it covers all stages of the AI data pipeline: data collection and ingestion
It says: “Networking infrastructure has standardized on 100 gigabits per second (Gbit/sec) bandwidth links for AI workload deployments
Modern day NVMe drives provide 7GBit/sec throughput on average
making the network bandwidth between the storage servers and the GPU compute servers the bottleneck for AI pipeline execution performance.’
That’s why GPUDirect was invented by Nvidia
MiniO says don’t bother with complex InfiniBand: “We recommend that enterprises leverage existing
industry-standard Ethernet-based solutions (eg HTTP over TCP) that work out of the box to deliver data at high throughput for GPUs.” These have: “High interconnect speeds (800GbE and beyond) with RDMA over Ethernet support (ie RoCEv2).”
According to MinIO: “Object stores excel at handling various data formats and large volumes of unstructured data and can effortlessly scale to accommodate growing data without compromising performance.” Also MinIO’s viewpoint is that its object storage can easily scale to the exabyte levels that could be needed for AI pipeline storage and that it can perform fast enough
One aspect of this is that MinIO has: “Distributed in-memory caching that is ideal for AI model checkpointing use cases.” However
A separate “High-Performance Object Storage for AI Data Infrastructure” white paper states: “MinIO’s performance characteristics mean that you can run multiple Apache Spark
without suffering a storage bottleneck.”
MinIO’s distributed setup allows for parallel data access and I/O operations
reducing latency and accelerating training times
MinIO’s high-throughput data access ensures swift retrieval and deployment of AI models
and enables predictions with minimal latency
More importantly MinIO’s performance scales linearly from 100s of TBs to 100s of PBs and beyond.”
It quotes performance benchmarks with a distributed MinIO setup delivering 46.54G/sec average read throughput (GET) and 34.4GB/sec write throughput (PUT) with an 8-node cluster
A 32-node cluster delivered 349GB/sec read and 177.6GB/sec write throughput
MinIO says it has customer deployments of 300 servers that are reading at 2.75TB/sec
We can take it as read that a MinIO setup can attain overall GPUDirect-like speeds but there are no comparisons we can find between such a MinIO system and a GPUDirect-supporting parallel file system delivering the same overall bandwidth
We can’t therefore directly compare the number and cost of the server
storage and network components in a MinIO and
with each providing 349GB/sec read and 177.6GB/sec write throughput
The DataPOD white paper says: “Enterprise customers using MinIO for AI initiatives build exabyte scale data infrastructure as repeatable units of 100PiB.” These consists of 30 racks
and with 10x 64-port network spine switches
single socket 64-core CPU-powered system with 128 PCIe 4 lanes
The reference architecture doc identifies Supermicro A+ 2114SWN24RT
a Dell PowerEdge R761 rack server and HPE ProLiant DL345 Gen 11 as valid servers
It reckons such a setup would cost $1.5 per TB/month for the hardware and $3.54 per TB/month for the MiniIO software – $1,500/month for the hardware
$3,540/month for the software and $5,040/month all-in
MinIO asserts that “Vendor-specific turnkey hardware appliances for AI will result in high TCO and is not scalable from a unit economics standpoint for large data AI initiatives at exabyte scale.”
It argues “AI data infrastructure in public clouds are all built on top of object stores
This is a function of the fact that the public cloud providers did not want to carry forward the chattiness and complexity associated with POSIX
The same architecture should be no different when it comes to private/hybrid cloud deployments.”
MinIO goes on to assert: “As high-speed GPUs evolve and network bandwidth standardizes on 200/400/800Gbit/sec and beyond
purpose-built object storage will be the only solution that would meet the performance SLAs and scale of AI workloads.”
and VAST Data – the GPUDirect-supporting parallel filestore suppliers – will all disagree with that point
Are your organization's data management tools up to the task
an open-source distributed SQL query engine
can give you better data processing and analysis
Maybe you've heard about Trino but want to know more before you change systems
Find out more here about Trino and how it can improve query performance
Trino is an open-source distributed SQL query engine
Engineers designed Trino for ad hoc and batch ETL queries against several types of data sources
Trino supports relational and non-relational sources
Trino can handle standard and semi-structured data types
Some people mistakenly think Trino is a database
Trino doesn't actually store any data
Trino split from Facebook's Presto project
Engineers at Facebook developed Presto to process the petabytes of data Facebook was trying to analyze
The creators wanted Presto to remain open-source and community-based
Facebook applied for a trademark for the name Presto
the functionalities of each system have started to differ
Trino started as a way to manage the incredibly large data sets Facebook needed to analyze
Trino queries to process faster than queries using other engines
Several factors contribute to this acceleration
Trino architecture is similar to massively parallel processing (MPP) databases
A coordinator node manages multiple worker nodes to process all the work
This partitions the data into smaller chunks to distribute across the nodes
When data chunks arrive at a particular machine
Processing happens over multiple threads within a particular node
Users can run more complex operations like JSON and MAP transformations and parsing
One factor in Trino's speed is that it doesn't rely on checkpointing and fault tolerance methods
but it also creates a large amount of latency
Removing the fault-tolerance requirement is a major change from older big data systems
It makes Trino ideal for queries where the cost of recovering from failure is less than the cost of checkpointing
Trino can push the processing of queries down into the connected data source
The operation goes to the source system where custom indexes on the data already exist
Pushdown improves overall query performance
It reduces the amount of data read from storage files
These forms of pushdown reduce network traffic between Trino and the data source
They also reduce the load on the remote data source
Support for pushdown depends on each connector and the underlying database or storage system
The Trino cost-based optimizer (CBO) uses table and column statistics to create the most efficient query plan
It considers the three main factors that contribute to how long a query takes:
The CBO balances the different demands for queries
Ensuring that all cluster users can work at the same time
You can only truly optimize for one of these priorities
The CBO creates and compares different variants of a query execution plan to find the option with the lowest overall cost
Trino runs storage and computing separately
The Trino cluster doesn't store your data
so it can auto-scale depending on the load without losing any data
Trino is an online analytical processing (OLAP) system
Trino extends the traditional OLAP data warehouse solution by running as a query engine for a data lake or data mesh
You can interactively run queries across various data sources
you don't need to move the data ahead of time
The power and flexibility of Trino make it well-suited for many use cases
You can use it for all of them or to solve one particular problem
As the Trino users in your organization gain experience with its benefits and features
you'll likely discover other uses as well
End-users can use SQL to run ad hoc queries where the data resides
You don't have to move the data to a separate system
You can quickly access data sets that analysts need
You can query data across many sources to build reports and dashboards for business intelligence
Data scientists and analysts can create queries without needing to rely on data ops engineers
One common use case for Trino is directly querying data on a data lake without needing transformation
You can query structured or semi-structured data from multiple sources
This streamlines the process of creating operational dashboards
Trino can use the Hive connector against HDFS and other object storage systems
You can get SQL-based analytics on your data lake however it stores data
Trino is a great engine for your batch extract
It can rapidly process a large amount of data
It can bring in data from different sources without always needing to extract it from sources like MySQL
An ETL through Trino is a standard SQL statement
End users can perform other ad hoc transformations
The extensive Trino connector framework means that any connector can be the source for an ETL
If you were wondering how Trino can accelerate your queries
you're considering two cutting-edge query engines
Several features of Trino can improve your query performance
Trino architecture uses massively parallel processing
and the cost-based optimizer also accelerate the query lifecycle
The Trino open source distributed query engine is known as a choice for running ad-hoc analysis where there’s no need to model the data and it can be easily sliced and diced
Trino can be also leveraged for running geospatial workloads atop different data sources
When it comes to running geospatial queries
Trino is OpenGIS compliant and supports a wealth of geospatial-specific functions using WKT/WKB vector data formats
Its diverse array of functions gives you the ability to unify and join geospatial data from multiple sources
you can join points of interest by using Trino’s Postgres connector with events tables stored on S3 by using the Hive connector.
we’ll walk you through two methods for running geospatial queries on the data lake using Trino’s Hive connector
explore some optimizations to help you accelerate and improve the interactivity for your geospatial queries
Here is a list with some common geospatial analysis use-cases we came across:
Trino offers a wide range of geospatial functions that can be used out of the box
The first use case might require running queries that get all the riders that were up to some distance from the points of interest (restaurants
Below you will find such a query that joins the trips_data table and the places table
and counts the number of riders in trips that passed up to 500 million from the points of interest stored in the places table
Another example for using the Geospatial functions in Trino is running queries related to use-case 2 in which we look up for events that took place in a well-defined area
Below you kind find such a query that counts the number of drivers that drove in a specific area:
In the above query the Geospatial functions used in the predicate were not pushed down to the Hive connector
it’s possible to build an additional bounding box by using the lon and lat columns as suggested in query #2.1 or to use the Bing tiles system as suggested in query #4
Here we built a bounding box using the lat
lon columns which include the polygon of interest
The filters on these columns are being pushed down to the Hive connector
which results in reading less data from S3
Although we saw a performance improvement by adding the bounding box predicate both on the Hive and Varada connector
it’s important to note that dynamically building a bounding box is not always straightforward
Bing tiles are a way to define map areas at different resolutions:
They are stored as “quadkeys”:
Note that a prefix of a tile quadkey is always a larger tile that includes that tile at a lower resolution
In order to utilize the Bing tile segmentation, we can either create the Bing tile during the query runtime (as we did in query #4) or by using an ETL/ELT procedure that will add a new quadkey column to both the trips_data and the places table. In this article, we used Trino’s CTAS command for the ELT procedure described here
Once the quadkey column is in place we can now JOIN the places table and the table trips_data table by the quakey column
CTAS for creating the trips_data_bing table
The instructions for creating the trips_data and places table can be found here
Below in Query #3 we implement the same logic as query #1 and utilize the Bing tiles for doing the join between the places and trips table
This query joins the table by the quadkey column, which significantly reduces the number of rows read and boosts the query performance thanks to the dynamic filtering optimization
Bing tiles can also be leveraged as can be seen below in query #4
which implements the same login we saw on query #2:
When creating the Bing tile during query execution
execution time was slower for this query compared to the equivalent query #2
which might suggest that for this type of queries creating the Bing tile during query execution might not always be beneficial
when we ran the same experiment using the Varada Trino Connector we did see a significant improvement
After running the above queries we can then compile the data into a mini benchmark that details the queries’ runtime and showing the improvement factor after implementing the suggested optimization. Here are the results of a mini benchmark data, using 4Xi3.4xlarage machines on Amazon Web Services
Optimization improves performance by up to four times
Running these queries underlines the fact that you can use Trino to run geospatial queries on your data lake thanks to its out-of-the-box support for geospatial functions.
by simply applying the suggested methods above you can improve your geospatial queries performance by up to X4
You can leverage your investment in Trino to run geospatial analysis today on the Hive connector by applying any of the methods and optimizations described in this article. And check out this post for results of running the same queries on the Community Edition of Varada’s Trino connector
The world’s leading publication for data science
I have submitted a patch for [this](https://github.com/aakashnand/trino-ranger-demo) issue and there is already an open JIRA issue here but that will not stop us from integrating Trino with Apache Ranger
I have built the Apache Ranger 2.1.0 with the Trino plugin
If you want to build the Apache Ranger from source code including the trino plugin you can refer to this GitHub repository on the branch ranger-2.1.0-trino and for this tutorial purpose
Trino plugin is now officially available in the ranger repository and it is released in Apache Ranger-2.3 https://github.com/apache/ranger/tree/ranger-2.3
Apache Ranger has three key components ranger-admin
Note: Configuring ranger-usersync is out of scope for this tutorial and we will not use any usersync component for this tutorial
Ranger Admin component is a UI component using which we can create policies for the different access levels
in our case we are using Postgres as the backend database for Ranger Admin UI
Ranger Audit component collects and shows logs for each access event of the resource
We will use elasticsearch to store ranger audit logs which will be then displayed in the Ranger Audit UI as well
Trino is a fast distributed query engine. It can connect to several data sources such as hive , postgres , oracle and so on. You can read more about Trino and Trino connectors in the official documentation here
we will use the default catalog tpch which comes with dummy data
Apache Ranger supports many plugins such as HDFS
Each of these plugins needs to be configured on the host which is running that process
Trino-Ranger-Plugin is one component that will communicate with Ranger Admin to check and download the access policies which will be then synced with Trino Server
The downloaded policies are stored as JSON files on the Trino server and can be found under the path /etc/ranger/<service-name>/policycache so in this case the policy path is /etc/ranger/trino/policycache
The communication between the above components is explained in the following diagram
The docker-compose file connects all of the above components
The ranger-Admin process requires a minimum of 1.5 GB of memory
The Ranger-Admin tar file contains install.properties and setup.sh
The setup.sh the script reads the configuration from install.properties
The following patch file describes configuration changes made to install.properties compared to the default version of install.propertiesfor Ranger-Admin component
Ranger-Trino-Plugin tar file also contains install.properties and enable-trino-plugin.sh script
One important point to note about the trino docker environment is that the configuration files and plugin directory are configured to different directory locations
The configuration is read from /etc/trino whereas plugins are loaded from /usr/lib/trino/plugins These two directories are important when configuring install.properties for Trino-Ranger-Plugin and hence some extra customization is required to the default script enable-trino-plugin.sh that comes with the Trino-Ranger-Plugin tar file to make it work with dockerized Trino
These changes are highlighted in the following patch file
these changes introduce two new custom variables INSTALL_ENV and COMPONENT_PLUGIN_DIR_NAME which can be configured in install.properties
install.properties file for Trino Ranger Plugin needs to be configured as shown in the following patch file
Please note that we are using two newly introduced custom variables to inform enable-plugin-script that Trino is deployed in the docker environment
Finally, putting it all together in the docker-compose.yml as shown below. This file is also available in Github Repository here
we will deploy docker-compose services and confirm the status of each component
Once we deploy services using docker-compose
we should be able to see four running services
Let’s confirm that Trino and Ranger-Admin services are accessible on the following URLs
Ranger Admin: http://localhost:6080
Trino: http://localhost:8080
Elasticsearch: http://localhost:9200
Let’s access Ranger-Admin UI and log in as admin user
We configured our admin user password rangeradmin1 in the above ranger-admin-install.properties file
let’s create a service with the name trino
The service name should match with the name defined in install.properties for Ranger-Admin
Please note the hostname in the JDBC string
From ranger-admin container trino is reachable at my-localhost-trino hence hostname is configured as my-localhost-trino
If we click on Test Connection we will get a Connection Failed error as shown below
This is because the Ranger-Admin process is already running and is still looking for a service with the nametrino which we have not created yet
So let’s add trino service and then click Test Connection again
Now Ranger-Admin is successfully connected to Trino 🎉
navigate to audit from the top navigation bar and click Audit
We can see that audit logs are displayed 🎉
Ranger-Admin and Elasticsearch are working correctly
it is time to create actual access policies and see it in action
To understand the access scenario and create an access policy we need to create a test user
and group memberships from various sources
Ranger usersync provides a set of rich and flexible configuration properties to sync users
and group memberships from AD/LDAP supporting a wide variety of use cases
we will manually create a test user from Ranger-Admin UI
let’s navigate to Settings → Users/Groups/Roles → Add New User
When creating a user we can choose different roles
Let’s confirm access for the user ranger-admin
As we can see ranger-admin user can access all the tables under schema tpch.sf10
Since we have not configured any policy for test-user if we try to access any catalog or execute any query, we should see an access denied message. Let’s confirm this by executing queries from Trino CLI
Let’s create a policy that allows test-user access to tpch.sf10 to all tables
We can also assign specific permissions on each policy
but for the time being let’s create a policy with all permissions
We are still getting access denied message
This is because Trino ranger policies need to be configured for each object level
catalog+schema+table level policy and information_schema policy
Let’s add policy for the catalog level
We are still getting the error but the error message is different
Let’s navigate to Ranger Audit Section to understand more about this
We can see an entry that denied permission to a resource called tpch.information_schema.tables.table_schema
information_schema is the schema which contains metadata about table and table columns
So it is necessary to add policy for information_schema as well
Access to information_schema is required for any user to execute the query in Trino
we can use the {USER} variable in Ranger policy that gives access to all users
Let us confirm the access from Trino CLI again
We still get access denied if we try to execute any SQL function
all-functionspolicy (ID:3) is the policy that allows access to execute any SQL function
Since executing SQL function is a requirement for all users
Let’s edit the all-functionspolicy (ID:3) and add all users using the {USER}variable to give access to functions
to give access to test-user to ALL tables under sf10 we added three new policies and edited the default all-function policy
Now we can access and execute queries for all tables for sf10 schema
let’s understand how to give access to test-user for a specific table under schema sf10
we configured policies to give access to ALL tables under sf10 schema and therefore
To give access to a specific schema we need to add schema-level policy and then we can configure table-level the policy
So let us add schema-level a policy for tpch.sf10
Now let us edit sf10-all-tables-policy from all tables to specific table
We will configure a policy that will allow access to onlynation table
So finally we have the following active policies
Now let’s execute queries from Trino CLI again for test-user
test-user can now access the onlynation table from tpch.sf10 schema as desired
If you have followed all the steps and reached this end
now you have understood how to configure Trino and Apache Ranger
Due to the lack of good documentation and not so intuitive nature of the integration process, integrating Apache Ranger and Trino can be painful, but I hope this article makes it a bit easier. If you are using Trino, I highly recommend you to join Trino Community Slack for more detailed discussions
Our weekly selection of must-read Editors’ Picks and original features
Choosing the right architecture with examples
a tool for defining ETL pipelines in YAML and…
Practical steps to identifying business-critical data models and dashboards and drive confidence in your data
My version to justify the existence of Data Mesh
The world’s leading publication for data science
Here’s how we break down the shoe in every review
We use cookies to provide the best experience on our website. Read our Cookie Policy
There’s not much we like more than brand-new gear to review
and it’s tough to tell from the last one
you really get to see what brands can cook up
we’re talking Arc’teryx and taking a closer look at some apparel that’s ready for the dead of winter
While this gear is directed towards trail runners
You’ll find (like we did) that the gear is so good it needs almost no introduction
and they’re one of the few companies where the quality justifies the price tag
MEAGHAN: The Trino SL Anorak is a really lightweight
breathable jacket made from GORE-TEX INFINIUM
Since the GORE material works as the outer surface
water simply beads and runs right off the jacket
Arc’teryx suggests this is a trim-fitting jacket
but I could easily add a few layers underneath without issue
There’s a low-profile hoodie for an added layer of protection and a zippered kangaroo pocket in the front if you want to store a phone or some keys
this pocket doesn’t work great while running
and any items you throw in there will bounce around
What I do love about this Anorak is its versatility
I’ve used it for a windy 50-degree day and a cold
The material keeps you warm and breathes well if you start to sweat out there
I love the Tatsu color (what I would call olive green) I received
This would be an ideal jacket for just about any outdoor activity — running
or even just walking around the city on a cold
this jacket will definitely last you a very long time
ROBBE: This is my first time reviewing Arc’teryx
As with most things claiming to be breathable-yet-water-repellent
they end up as breathable as a low-grade trash bag
Thomas assured me that GORE-TEX Infinium was the real deal
so I was excited to try out this “hoody” (I mean
A quick rundown of the hoody: it’s a water-repellent running jacket made from GORE-TEX Infinium
which is meant to block wind while providing exceptional breathability while repelling light rain or snow
The shell is soft and features a four-way stretch
offering plenty in the range-of-motion department while on the run
I tested this hoody in various sub-freezing conditions down to 18F degrees with a windchill of 9F degrees
The warmest temperature I tested it in was 36F degrees with minimal wind
this jacket could go down to any manageable level of cold
you simply just add more layers beneath it
A back hem provides extra length while the sleeves extend just a bit beyond the wrist for coverage in less-than-fair conditions
It’s a trim cut but not so slim that you can’t add layers beneath it
it’s the perfect cut for adding a layer or two while keeping close enough that cold air can’t get between
In 20-30F temps, I wore a Tracksmith Brighton Base Layer beneath it
A long sleeve under this at 36F was much too warm
I can tell you that a lot of jackets I’ve worn
become a straight sauna at any temperature
no matter how well you try and figure out your layering technique
the Trino SL was actually very breathable while doing precisely what it said it would do – blocking the wind and offering light rain repellency
I wore it on some frigid and dark mornings with considerable gusts of wind
It has quickly become my go-to running jacket in adverse weather conditions
I was actually very surprised at how well it held up in the rain
I wore it during a steady rain (not downpour-level rain
but definitely umbrella-necessitating for my kids walking to school)
and the water beaded up and rolled off as it would with any waterproof jacket
it will certainly hold up in most rainy conditions
To have a breathable jacket that manages to do that is a delight
I should mention that although it offers welded seams around the collar and the length of the zipper
the rest is simply stitched together (again
so it molds nicely around the face with a brim that extends far enough out from the forehead to keep water out of the eyes
It also has a cinch cord in the back to tighten it
the hood can be rolled up and snapped into the collar
which is great because running with the hood down was somewhat annoying
The bill on the hood bounced on your back and kind of pulls on the jacket collar if the hood is free
which was really the only downside of the jacket for me
and it actually looks pretty cool as a casual jacket
The inside zip pockets are a soft light mesh that pleased me in a way that’s hard to describe
Most jacket pockets absolutely suck in terms of tactile feel
These pockets felt like I always wanted to just have my hands in there
the right pocket has a modified compartment that allows for upright items like keys or a wallet to be stored so they won’t fall out if you forget to zip up the pocket
I should say that while the Arc’teryx Trino SL is reasonably lightweight, there are more lightweight and fully waterproof options out there (like the Mammut Kento Light)
this does not pack down into a pocket or bag
I kind of hate those ultralight jackets because they’re just so specific for one purpose — saving weight
And let’s face it — most of us aren’t elite adventurers
If I’m paying $200 for a lightweight jacket
I want something I can wear on and off the run that accommodates me in both situations
which is exactly where the Arc’teryx Trino SL falls
Trino SL Hoody and Trino SL Tights (Sunglasses by Ombraz
and it’s super weird it doesn’t because you can’t really hang it from the hood on account of the brim
the brim bounces and its weight pulls on the collar if the hood is not secured
but it’s also hard to slide above your GPS watch (especially if it’s a big one like the Coros Vertix 2)
You may want to wear your watch on your sleeve
I really loved the wine colorway of this jacket
but I don’t believe it’s available at the moment (perhaps it’s coming soon)
I love this hoody and expect it to become a staple of my running wardrobe
If we’re talking Arc’teryx quality and construction
I think it’s pretty reasonable at $219
I wasn’t supposed to be testing these
as they were a women’s small that was sent to Meaghan
the women’s small was laughably large on her (a 31″ inseam for a 5’2″ woman
and it fit almost perfectly (though it was still a tad long)
I should also point out that the name is a misnomer; in no world are these running tights
Which is fine — a lot of people don’t like tights
and these can crossover from running to other outdoor endeavors like hiking and bikepacking
Like the Arc’teryx Trino SL Hoody, the tights feature GORE-TEX INFINIUM, which is, again, windproof, breathable, and water-resistant. On the run, they offer plenty of room for range of motion and do exactly what they’re meant to do — block out the wind and moderate rain. However, I would recommend adding a light full-length base layer like the Path Projects Tahoe during colder temperatures
this is for the runner who doesn’t want a tight but also wants incredibly high-quality technology that will get them through most adverse weather conditions
THOMAS: The Arc’teryx Cormac pant is perfect for crossover running
“WTH is crossover running?!” I’m talking about those days here in covid times when you don’t have to shower after your run
Like when you can get your miles and then grab breakfast standing up
that turns into just working throughout the day
and you realize you haven’t bathed all day
but you feel perfectly fine throwing on a clean t-shirt to help get dinner ready
you might just shed the Cormac Pant to your bedroom floor
and then reach for them in the morning to start the whole cycle over again
See also: The Best Winter Running Pants and Tights
The fitted pant with a drawstring waist has incredibly soft fabric that is insanely comfortable
with two zippered hand pockets lined with reflective tape
The fabric is thick enough to keep you warm during winter runs
While most jogger-style pants with side pockets are bounce-houses for phones or keys
the Cormac pant kept things surprisingly b bounce-free
That’s usually the downfall of most “jogger-style” pants
but I was pleasantly surprised with how well the Cormac kept things in place
the Cormac is a staple that feels luxurious
The Cormac fits nicely in my regular medium size and will cost you $129
you’ll probably join my all-day wear club since these pants can do it all from sport to lounge to casual
I’m tempted to see how they look with a sports coat and oxford
and website in this browser for the next time I comment
Δdocument.getElementById( "ak_js_1" ).setAttribute( "value"
Meaghan signed up for her first marathon three weeks before the race
because it was $10 more than the half she planned to run
She learned everything in running the hard way
Now a USATF & UESCA certified run coach
she loves encouraging friends to go for big goals as she continues to chase faster times
Robbe is the senior editor of Believe in the Run
He loves going on weird routes through Baltimore
Δdocument.getElementById( "ak_js_2" ).setAttribute( "value"
Δdocument.getElementById( "ak_js_3" ).setAttribute( "value"
We have the address for the funeral home & the family on file
If you're not happy with your card we'll send a replacement or refund your money
Angela Marie (Trino) Jackson created this Life Tributes page to make it easy to share your memories
This website is using a security service to protect itself from online attacks
The action you just performed triggered the security solution
There are several actions that could trigger this block including submitting a certain word or phrase
You can email the site owner to let them know you were blocked
Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page
There are several challenges that a modern blockchain indexing startup may face
Image includes combined content which may include AI-generated content
As blockchain technology has become more widespread
the amount of data stored on the blockchain has increased
This is because more people are using the technology
and each transaction adds new data to the blockchain
blockchain technology has evolved from simple money-transferring applications
such as those involving the use of Bitcoin
to more complex applications involving the implementation of business logic within smart contracts
These smart contracts can generate large amounts of data
contributing to the increased complexity and size of the blockchain
this has led to a larger and more complex blockchain
we review the evolution of Footprint Analytics’ technology architecture in stages as a case study to explore how the Iceberg-Trino technology stack addresses the challenges of on-chain data
Footprint Analytics has indexed about 22 public blockchain data
and over 100,000 NFT collections into a semantic abstraction data layer
It’s the most comprehensive blockchain data warehouse solution in the world
which includes over 20 billions rows of records of financial transactions
it’s different from ingression logs in traditional data warehouses
We have experienced 3 major upgrades in the past several months to meet the growing business requirements:
At the beginning of Footprint Analytics, we used Google Bigquery as our storage and query engine; Bigquery is a great product
and provides dynamic arithmetic power and a flexible UDF syntax that helps us quickly get the job done
So we decided to explore other alternative architectures
We were very interested in some of the OLAP products which had become very popular
The most attractive advantage of OLAP is its query response time
which typically takes sub-seconds to return query results for massive amounts of data
and it can also support thousands of concurrent queries
We picked one of the best OLAP databases, Doris
at some point we soon ran into some other issues:
we couldn’t use Doris for our whole data pipeline on production
so we tried to use Doris as an OLAP database to solve part of our problem in the data production pipeline
acting as a query engine and providing fast and highly concurrent query capabilities
so we had to periodically synchronize data from Bigquery to Doris using it as a query engine
This synchronization process had several issues
one of which was that the update writes got piled up quickly when the OLAP engine was busy serving queries to the front-end clients
the speed of the writing process got affected
and synchronization took much longer and sometimes even became impossible to finish
We realized that the OLAP could solve several issues we are facing and could not become the turnkey solution of Footprint Analytics
especially for the data processing pipeline
and we could say OLAP as a query engine alone was not enough for us
Welcome to Footprint Analytics architecture 3.0
a complete overhaul of the underlying architecture
We have redesigned the entire architecture from the ground up to separate the storage
computation and query of data into three different pieces
Taking lessons from the two earlier architectures of Footprint Analytics and learning from the experience of other successful big data projects like Uber
We first turned our attention to data lake
a new type of data storage for both structured and unstructured data
Data lake is perfect for on-chain data storage as the formats of on-chain data range widely from unstructured raw data to structured abstraction data Footprint Analytics is well-known for
We expected to use data lake to solve the problem of data storage
and ideally it would also support mainstream compute engines such as Spark and Flink
so that it wouldn’t be a pain to integrate with different types of processing engines as Footprint Analytics evolves
and we can choose the most appropriate computation for each of our metrics
With Iceberg solving the storage and computation problems
we had to think about choosing a query engine
The most important thing we considered before going deeper was that the future query engine had to be compatible with our current architecture
which has very good support for Iceberg and the team were so responsive that we raised a bug
which was fixed the next day and released to the latest version the following week
This was the best choice for the Footprint team
who also requires high implementation responsiveness
we did a performance test on the Trino + Iceberg combination to see if it could meet our needs and to our surprise
Knowing that Presto + Hive has been the worst comparator for years in all the OLAP hype
the combination of Trino + Iceberg completely blew our minds
An 800 GB table1 joins another 50 GB table2 and does complex business calculations
case2: use a big single table to do a distinct query
Test sql: select distinct(address) from the table group by day
The Trino+Iceberg combination is about 3 times faster than Doris in the same configuration
there is another surprise because Iceberg can use data formats such as Parquet
Iceberg’s table storage takes only about 1/5 of the space of other data warehouses The storage size of the same table in the three databases is as follows:
Note: The above tests are examples we have encountered in actual production and are for reference only
The performance test reports gave us enough performance that it took our team about 2 months to complete the migration
and this is a diagram of our architecture after the upgrade
Footprint Analytics team has completed three architectural upgrades in less than a year and a half
thanks to its strong desire and determination to bring the benefits of the best database technology to its crypto users and solid execution on implementing and upgrading its underlying infrastructure and architecture
The Footprint Analytics architecture upgrade 3.0 has bought a new experience to its users
allowing users from different backgrounds to get insights in more diverse usage and applications:
Disclaimer: Our writers' opinions are solely their own and do not reflect the opinion of CryptoSlate
None of the information you read on CryptoSlate should be taken as investment advice
nor does CryptoSlate endorse any project that may be mentioned or linked to in this article
Buying and trading cryptocurrencies should be considered a high-risk activity
Please do your own due diligence before taking any action related to content within this article
CryptoSlate takes no responsibility should you lose money trading cryptocurrencies
Shardeum is revolutionizing blockchain decentralization and scalability with its autoscaling EVM-based architecture and lightweight node requirements
Footprint Analytics provides API and visualization tools to uncover and visualize data across the blockchain
Disclaimer: By using this website, you agree to our Terms and Conditions and Privacy Policy
CryptoSlate has no affiliation or relationship with any coin
project or event unless explicitly stated otherwise
CryptoSlate is only an informational website that provides news about coins
Please do your own diligence before making any investment decisions
in connection to the use or reliance of any content you read on the site
© 2025 CryptoSlate. All rights reserved. Disclaimers | Terms | Privacy
Please add "[email protected]" to your email whitelist
One of the most popular cartoonists in Mexico will be celebrated Tuesday with the opening of "Trino's World" — El Mundo de Trino — at the Comic-Con Museum in Balboa Park
The museum and the Consulate General of Mexico in San Diego will cut the ribbon at 11 a.m
"It is our honor to work with the Consulate General of Mexico in San Diego to bring the work of this renowned artist to the Comic-Con Museum," said Rita Vandergaw
"and as a binational community we are doubly excited to highlight Trino's work."
and objects from Trino's personal collection intended to offer a glimpse into his "creative universe." He was born in Guadalajara
Jalisco in 1961 and has a career spanning more than four decades both in print and electronic media
including the National Prize for Journalism in Political Cartoons in 2000 and the Inkpot Award at Comic-Con in 2022 for his contributions to the world of comics
He has published more than 20 books with his comic strips and drawings
Together with his "partner-in- comics," the cartoonist Jis
he created the animated film "El Santos vs La Tetona Mendoza" in 2012
they have hosted the television program "La Chora Interminable" together
Trino has become a true icon of Mexican popular culture," said the Consul General of Mexico
"The fact that Comic-Con Museum has opened the space for an exhibition of his versatile work speaks of the impact that this beloved and respected cartoonist from Jalisco has achieved beyond our borders
to the point that his art today serves as a bridge for dialogue and understanding between the people of Mexico and the United States.