Hướng dẫn expand contractions python - mở rộng các cơn co thắt python

Tiền xử lý văn bản là một bước quan trọng trong NLP. Làm sạch dữ liệu văn bản của chúng tôi để chuyển đổi nó thành một hình thức có thể trình bày có thể phân tích và có thể dự đoán được cho nhiệm vụ của chúng tôi được gọi là tiền xử lý văn bản. Trong bài viết này, chúng tôi sẽ thảo luận về các cơn co thắt và cách xử lý các cơn co thắt trong văn bản.

Show

    Các cơn co thắt là gì?

    Các cơn co thắt là các từ hoặc kết hợp của các từ được rút ngắn bằng cách bỏ các chữ cái và thay thế chúng bằng một dấu nháy đơn.

    Ngày nay, nơi mọi thứ đang thay đổi trực tuyến, chúng tôi giao tiếp với người khác nhiều hơn thông qua tin nhắn văn bản hoặc bài đăng trên các phương tiện truyền thông xã hội khác nhau như Facebook, Instagram, WhatsApp, Twitter, LinkedIn, v.v. dưới dạng văn bản. Với rất nhiều người để nói chuyện, chúng tôi dựa vào chữ viết tắt và rút ngắn hình thức từ để nhắn tin cho mọi người.

    Ví dụ, tôi sẽ ở đó trong vòng 5 phút. Bạn không gng ở đó? Tôi có phải là MSSNG trên smthng không? Tôi muốn nhìn thấy bạn gần D Park. I’ll be there within 5 min. Are u not gng there? Am I mssng out on smthng? I’d like to see u near d park.

    Trong các cơn co thắt tiếng Anh, chúng ta thường thả các nguyên âm từ một từ để tạo thành các cơn co thắt. Loại bỏ các cơn co thắt góp phần tiêu chuẩn hóa văn bản và rất hữu ích khi chúng tôi đang làm việc trên dữ liệu Twitter, về các đánh giá về một sản phẩm vì các từ đóng vai trò quan trọng trong phân tích tình cảm.

    Làm thế nào để mở rộng các cơn co thắt?

    1. Sử dụng thư viện co thắt

    Đầu tiên, cài đặt thư viện. Bạn có thể thử thư viện này trên Google Colab khi cài đặt thư viện trở nên siêu mượt.

    Sử dụng PIP:

    !pip install contractions

    Trong Notebook Jupyter:

    import sys  
    !{sys.executable} -m pip install contractions

    Mã 1: & nbsp; để mở rộng các cơn co thắt bằng thư viện co thắt

    Python3

    import sys  
    !{sys.executable} -m pip install contractions
    1
    import sys  
    !{sys.executable} -m pip install contractions
    2

    import sys  
    !{sys.executable} -m pip install contractions
    3
    import sys  
    !{sys.executable} -m pip install contractions
    4

    import sys  
    !{sys.executable} -m pip install contractions
    5
    import sys  
    !{sys.executable} -m pip install contractions
    4
    import sys  
    !{sys.executable} -m pip install contractions
    7

    import sys  
    !{sys.executable} -m pip install contractions
    8
    import sys  
    !{sys.executable} -m pip install contractions
    9
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    0
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    1

    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    2
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    3

    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    4
    import sys  
    !{sys.executable} -m pip install contractions
    4
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    6
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    7

    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    8
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    9
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    0
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    1
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    2

    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    8
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    9
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    5
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    1
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    7

    Output:

    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.

    Loại bỏ các cơn co thắt trước khi hình thành các vectơ từ giúp giảm kích thước.

    Mã 2: Chỉ cần sử dụng các cơn co thắt.Fix để mở rộng văn bản.

    Python3

    import sys  
    !{sys.executable} -m pip install contractions
    3
    import sys  
    !{sys.executable} -m pip install contractions
    4

    !pip install contractions
    0

    Output:

    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'

    Các cơn co thắt cũng có thể được xử lý bằng các kỹ thuật khác như ánh xạ từ điển và cũng sử dụng thư viện pyContractions. Bạn có thể tham khảo tài liệu của Thư viện PyContractions để tìm hiểu thêm về điều này: https://pypi.org/project/pycontraction/


    Tiền xử lý văn bản là một bước quan trọng trong NLP. Làm sạch dữ liệu văn bản của chúng tôi để chuyển đổi nó thành một hình thức có thể trình bày có thể phân tích và có thể dự đoán được cho nhiệm vụ của chúng tôi được gọi là tiền xử lý văn bản. Trong bài viết này, chúng tôi sẽ thảo luận về các cơn co thắt và cách xử lý các cơn co thắt trong văn bản.

    Các cơn co thắt là gì?

    Các cơn co thắt là các từ hoặc kết hợp của các từ được rút ngắn bằng cách bỏ các chữ cái và thay thế chúng bằng một dấu nháy đơn.

    Ngày nay, nơi mọi thứ đang thay đổi trực tuyến, chúng tôi giao tiếp với người khác nhiều hơn thông qua tin nhắn văn bản hoặc bài đăng trên các phương tiện truyền thông xã hội khác nhau như Facebook, Instagram, WhatsApp, Twitter, LinkedIn, v.v. dưới dạng văn bản. Với rất nhiều người để nói chuyện, chúng tôi dựa vào chữ viết tắt và rút ngắn hình thức từ để nhắn tin cho mọi người.

    Ví dụ, tôi sẽ ở đó trong vòng 5 phút. Bạn không gng ở đó? Tôi có phải là MSSNG trên smthng không? Tôi muốn nhìn thấy bạn gần D Park. I’ll be there within 5 min. Are u not gng there? Am I mssng out on smthng? I’d like to see u near d park.

    Trong các cơn co thắt tiếng Anh, chúng ta thường thả các nguyên âm từ một từ để tạo thành các cơn co thắt. Loại bỏ các cơn co thắt góp phần tiêu chuẩn hóa văn bản và rất hữu ích khi chúng tôi đang làm việc trên dữ liệu Twitter, về các đánh giá về một sản phẩm vì các từ đóng vai trò quan trọng trong phân tích tình cảm.

    Làm thế nào để mở rộng các cơn co thắt?

    1. Sử dụng thư viện co thắt

    Đầu tiên, cài đặt thư viện. Bạn có thể thử thư viện này trên Google Colab khi cài đặt thư viện trở nên siêu mượt.

    Sử dụng PIP:

    !pip install contractions

    Trong Notebook Jupyter:

    import sys  
    !{sys.executable} -m pip install contractions

    Mã 1: & nbsp; để mở rộng các cơn co thắt bằng thư viện co thắt

    Python3

    import sys  
    !{sys.executable} -m pip install contractions
    1
    import sys  
    !{sys.executable} -m pip install contractions
    2

    import sys  
    !{sys.executable} -m pip install contractions
    3
    import sys  
    !{sys.executable} -m pip install contractions
    4

    import sys  
    !{sys.executable} -m pip install contractions
    5
    import sys  
    !{sys.executable} -m pip install contractions
    4
    import sys  
    !{sys.executable} -m pip install contractions
    7

    import sys  
    !{sys.executable} -m pip install contractions
    8
    import sys  
    !{sys.executable} -m pip install contractions
    9
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    0
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    1

    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    2
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    3

    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    4
    import sys  
    !{sys.executable} -m pip install contractions
    4
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    6
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    7

    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    8
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    9
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    0
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    1
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    2

    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    8
    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.
    9
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    5
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    1
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    7

    Output:

    Original text: I'll be there within 5 min. Shouldn't you be there too? 
              I'd love to see u there my dear. It's awesome to meet new friends.
              We've been waiting for this day for so long.
    Expanded_text: I will be there within 5 min. should not you be there too? 
              I would love to see you there my dear. it is awesome to meet new friends. 
              we have been waiting for this day for so long.

    Loại bỏ các cơn co thắt trước khi hình thành các vectơ từ giúp giảm kích thước.

    Mã 2: Chỉ cần sử dụng các cơn co thắt.Fix để mở rộng văn bản.

    Python3

    import sys  
    !{sys.executable} -m pip install contractions
    3
    import sys  
    !{sys.executable} -m pip install contractions
    4

    !pip install contractions
    0

    Output:

    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'

    Các cơn co thắt cũng có thể được xử lý bằng các kỹ thuật khác như ánh xạ từ điển và cũng sử dụng thư viện pyContractions. Bạn có thể tham khảo tài liệu của Thư viện PyContractions để tìm hiểu thêm về điều này: https://pypi.org/project/pycontraction/

    Tôi sẽ làm một cái gì đó như thế này.

    import re
    
    def remove_contraction_apostraphes(input):
        text = re.sub('([A-Za-z]+)[\'`]([A-Za-z]+)', r'\1'r'\2', input)                                       
        return text
    
    print(remove_contraction_apostraphes("can't"))
    

    1. Nó phù hợp với một hoặc nhiều chữ cái
      'she would like to know how I would done that! 
       she is going to the park and I do not think I will be home for dinner.
       they are going to the zoo and she will be home for dinner.'
      1
    • Mọi thứ trong ngoặc vuông có nghĩa là một trong những ký tự này, cộng có nghĩa là ít nhất một hoặc nhiều thứ đến trước
    1. theo sau là một trong những điều sau đây

      'she would like to know how I would done that! 
       she is going to the park and I do not think I will be home for dinner.
       they are going to the zoo and she will be home for dinner.'
      2 hoặc `

    2. theo sau là một hoặc nhiều chữ cái

    và thay thế nó bằng

    1. Những gì đã được tìm thấy trong bộ điểm số đầu tiên
      'she would like to know how I would done that! 
       she is going to the park and I do not think I will be home for dinner.
       they are going to the zoo and she will be home for dinner.'
      3
    • r '\ 1' trả về mẫu được khớp với
      'she would like to know how I would done that! 
       she is going to the park and I do not think I will be home for dinner.
       they are going to the zoo and she will be home for dinner.'
      4 đầu tiên
    1. tiếp theo là những gì được tìm thấy trong bộ điểm số thứ hai
      'she would like to know how I would done that! 
       she is going to the park and I do not think I will be home for dinner.
       they are going to the zoo and she will be home for dinner.'
      5

    Nếu bạn có các nhân vật khác, chẳng hạn như �, và bạn biết tất cả chúng là gì, bạn có thể đặt chúng với các dấu ngoặc vuông. Dòng này sẽ phù hợp với bất kỳ ký tự nào và giải thích cho cơ hội của không gian trắng bởi dấu nháy đơn

    text = re.sub('([A-Za-z]+)\s?[\'`�]\s?([A-Za-z]+)', r'\1'r'\2', input)       
    
    • /s: bất kỳ khoảng trắng nào
    • ? : 0 hoặc 1 của trước đó

    Bạn cũng có thể sử dụng

    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    6

    import sys  
    !{sys.executable} -m pip install contractions
    0

    Để phù hợp với bất kỳ số lượng nhân vật nào theo sau là bất kỳ ký tự nào không phải là chữ cái hoặc số, theo sau là bất kỳ số lượng nhân vật nào. Nếu bạn muốn thêm

    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    7 vào đó, tôi khuyên bạn nên thêm
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    8,
    'she would like to know how I would done that! 
     she is going to the park and I do not think I will be home for dinner.
     they are going to the zoo and she will be home for dinner.'
    9,
    import re
    
    def remove_contraction_apostraphes(input):
        text = re.sub('([A-Za-z]+)[\'`]([A-Za-z]+)', r'\1'r'\2', input)                                       
        return text
    
    print(remove_contraction_apostraphes("can't"))
    
    0,
    import re
    
    def remove_contraction_apostraphes(input):
        text = re.sub('([A-Za-z]+)[\'`]([A-Za-z]+)', r'\1'r'\2', input)                                       
        return text
    
    print(remove_contraction_apostraphes("can't"))
    
    1 ... cho bạn Regex làm cho nó


    Điều này sẽ phù hợp với bất kỳ sự co thắt nào, bất kể các chữ cái trước hoặc sau dấu nháy đơn như thế nào. Bạn sẽ cần đặt tất cả các dấu nháy đơn khác nhau mà bạn có trong khối ['`]